Program 3
cs550 Operating Systems
Due date:  Monday Oct. 27, 2014 at 11:59:59 p.m.
20 points

Objective:  To learn about the producer-consumer problem.

Specification:

For Program 3, you are to write two programs that will perform numerical optimization -- finding the minimum of a mathematical function.  The first program must be written in standard C and must use only pthreads as its parallel construct, and the second must be written in C using MPI and must use only MPI-style processes as its parallel construct.  Performing numerical optimization by using calculus is beyond the scope of this course, so we will use a Monte Carlo method.

Finding a minimum is actually really easy.  For example if you are given a function f(x) = x^2, where -10 <= x <= 10, we know from calculus or from direct observation that the minimum is zero.  Try graphing it!  To find a value very close to the minimum with a program using a Monte Carlo (random sampling) method is simple.  Take a large number of random samples (maybe 1,000,000) for x between -10 <= x <= 10, evaluate the function x^2 at every point, and find the value of x that yields the smallest number for x^2.

For this assignment you will be given a function that is more substantial than x^2 to work with, but the idea is the same.  Find the minimum of the function.  First, do so using threads.  Then, do so using processes.  Divide the sampling evenly between the threads/processes.  Your programs will take as input the bounds of the function, the number of samples, and the number of threads or processes to use.  Each process or thread must perform an approximately equal amount of work and store its minimum locally.  After computing this value, provide the result to a consumer process/thread that determines which result is the actual minimum.

Note that it is highly recommended that you use the Mersenne Twister pseudorandom number generator from Lab 5 at this link as part of the program.

For this program, you will use Engvall's Function (note that you may "hard code" this function into your program):

f(x,y) = x^4 + y^4 + 2*(x^2)*(y^2) - 4*x + 3

The actual minimum for this function is:

f(x,y) = 0

And is located at one point:

x = 1
y = 0

The bounds for this function should be no smaller than:

-2000 <= x <= 2000
-2000 <= y <= 2000

Reference: http://web.ift.uib.no/~antonych/RGA4.html

Threaded Program (10 points)

Inputs

For the threaded program, you are required to provide as input the minimum and maximum bounds for x and y, the number of samples per iteration, the number of iterations per producer thread, and the number of producer threads to run.  Read in this data from the console (stdin) on one line providing the minimum bound for x first, the maximum bound for x second, the minimum bound for y third, the maximum bound for y fourth, the number of samples per iteration fifth, the number of iterations sixth, and the number of threads seventh.  Information on how to use the bounds, samples, and number of iterations is provided later in this document.  For the purposes of computation, the number of samples must be greater than zero, as must the number of iterations, and the number of worker (producer) threads must be greater than or equal to one.

Outputs

Upon valid input, your program should print out the best result obtained from any of the threads over all of the iterations.  You must also print out the (x,y) location of this best result.  This output should be neat.  Your program must also include debugging output for each thread.  This means you should print a line of output showing the location and value of the minimum as determined by each thread.  Additionally, upon invalid input, your main thread must print an error message.  For example, you must indicate that there is an invalid input if the bounds are invalid, the sample size is less than one, the number of iterations is less than one, or the number of threads is less than one.  Additionally, you must provide the run time of your program in seconds as output.  This time must start before you begin computing the minima and must stop after the minima have been collected by the consumer thread, but before you print your result.  You will need to #include <time.h> and use clock() to get the starting time.  Clock time values are displayed in clock units, and are stored in primitive variables of type clock_t.   You can get the run time by subtracting the start time from a call to clock() after all computations are complete.  This clock time can be converted to seconds by dividing by CLOCKS_PER_SEC, a constant in time.h.  Do NOT use another function to compute the run time.

Shared Data

You may use the following "global" variables.  These variables will be shared between producer threads that will perform optimization and the consumer thread that will find the best value.

static const int NUM_WORKER_THREADS;  //The number of worker threads.
static const int NUM_SAMPLES_PER_THREAD;  //This is the number of samples to generate per thread.  It is possible to read in one value and store it in this constant.
static const int NUM_ITERATIONS_PER_THREAD;  //This is the number of iterations of sampling to perform per thread.
static const double X_MIN_BOUND; //The lower bound for x.
static const double X_MAX_BOUND; //The upper bound for x.
static const double Y_MIN_BOUND; //The lower bound for y.
static const double Y_MAX_BOUND; //The upper bound for y.
//Once set this value cannot be changed.
static node * head;  //The memory address of the head node of the linked list that will contain the minima found by each worker (producer) thread.
static sem_t full;  //The semaphore to track the number of full items in the list.
static sem_t empty; //The semaphore to track the number of empty slots in the list.
static sem_t mutex; //The semaphore to provide mutual exclusion on the list.

The variables above may be shared between the producer threads, consumer thread, and main thread.  The nodes within the list should be represented by C-style structs.  These will be dynamically allocated and should include the x-y location of the minimum (two double variables) and the value of the minimum (a double).  Therefore, this struct should be composed as follows:

struct node
{
  double x_location;
  double y_location;
  double function_value;
  struct node * next;
};

typedef struct node node;


Recall that the values for many of the constants above will be provided as input.  Even though these are constants, if they are not initialized, their values may be set once (for example, by using an input value).  If you are having trouble initializing these constants, it is OK to remove the const declarations.

Computations (Worker/Producer threads and Consumer thread)

You must implement the following two functions in your program.  These will represent your threads.  You may use additional helper functions if you so desire; however, you must use at minimum the function listed below.

void * produce_min(void *) - this function, which is utilized for each worker thread, must take only the thread id number as a parameter, and it must compute NUM_SAMPLES_PER_THREAD results from the Engvall function as described above.  This function should contain a for-loop that computes NUM_SAMPLES_PER_THREAD results NUM_ITERATIONS_PER_THREAD times.  With each iteration, a minimum value and location for the Engvall function should be determined.  This minimum should be retained and improved upon if possible with each pass through the for-loop.  After each pass, the result should be stored within a linked list (that is, produced within) that is protected by standard producer-consumer semaphores - full, empty, and mutex.

void * consume_results(void *) - this function, which is used for the consumer thread, must take only the thread id number as a parameter, and it must determine which of the results from the worker threads is the actual minimum.  Since this thread is a consumer, it must continously check the linked list, remove an item, and update the actual value and location of the global minimum.  Since this function represents a consumer thread, it must utilize the semaphores full, empty, and mutex to access the linked list and remove an item.  This function should print a result once it has received all of the minima.  Note that there should be NUM_WORKER_THREADS*NUM_ITERATIONS_PER_THREAD values for minima that will be produced by the worker threads.

Main Function

The main function of this program must read in the input data as explained above, perform error checking on the data, start the threads, and join the threads.  Note that the input data must be provided at the console and will be seven values on one line -- the minimum bound for x first, the maximum bound for x second, the minimum bound for y third, the maximum bound for y fourth, the number of samples per iteration fifth, the number of iterations sixth, and the number of threads seventh.

Compilation

Your program must read exactly one line of input from the keyboard (or a redirected input file).  Failure to do so will result in a loss of points on this programming assignment.  Save the source code for the threaded portion of program 3 in a file named prog3Threads.c.  Do NOT give this file a different name.  You will lose points if you give the file any other name than prog3Threads.c.  Additionally, you will need to use code from the threaded producer-consumer example on the course webpage to complete this portion of the project.  This includes the linked_list.h and linked_list.c files and may include additional files from the project.  You must create a makefile called makefile for your project and use this file to compile all of your files.

To compile your program use the following command:

    make

This must compile your program using the gcc compiler, and it will create an executable file named prog3Threads.exe.  Assuming that there are no errors in your program, you may run it using the command ./prog3Threads.exe.

To run this program, it is suggested that you use the UNIX redirection commands to provide input and obtain output from your program.  Run your program using the following command:

prog3Threads < p3t.in > p3t.out

To obtain input from the file p3t.in, and to send the output of your program to the file named p3t.out.  You may give these input and output files any name that you wish since you will not be submitting them.  The names provided are only examples.

MPI Program (10 points)

Inputs

For the MPI program, you are required to provide as command line input the minimum and maximum bounds for x and y, the number of samples per iteration, and the number of iterations per producer process.  Read in the minimum bound for x first, the maximum bound for x second, the minimum bound for y third, the maximum bound for y fourth, the number of samples per iteration fifth, and the number of iterations sixth.  This input data will be provided on the same line as the ibrun prog3Procs.exe in the batch script if you run the program on Stampede or the same line as mpirun if you were to run the program on LittleFe.  Note that the values will be provided immediately after the .exe file.  Command line inputs are stored as parameters to the main function.  Main always has at least one command line input -- the name of the program.  This ensures the integer argc is at least one and that argv[0] is a string containing the name of the program.  Since you will use six command line inputs, argc will be 7, and, as an example,  argv[1] will be a string containing the minimum bound for x.  This will need to be converted to a double and stored as the value X_MIN_BOUND.  The number of processes and nodes will be provided using the -n and -N options on Stampede or with the -np option on Littlefe.  On Stampede, the -n option allows you to specify the total number of processes to run, and the -N option allows you to specify the number of nodes to use.  On Stampede, you are limited to use at most 16 processes per node (equal to the number of CPU cores available).  Using more computational processes than there are CPU cores is called oversubscribing and is not generally permitted by the operating system on High Performance Computers.  Information on how to use the bounds, samples, and number of iterations is provided later in this document.  For the purposes of computation, the number of samples must be greater than zero, as must the number of iterations, and the total number of processes must be greater than or equal to two.

Outputs

Upon valid input, your program should print out the best result obtained from any of the worker processes (producers) over all of the iterations.  You must also print out the (x,y) location of this best result.  This output should be neat.  Your program must also include debugging output for each thread.  This means you should print a line of output showing the location and value of the minimum as determined by each worker (producer) process.  Additionally, upon invalid input, your program must print an error message.  It is ok if this message shows up for every process.  For example, you must indicate that there is an invalid input if the bounds are invalid, the sample size is less than one, the number of iterations is less than one, or the number of processes is less than two.  Additionally, you must provide the run time of your program in seconds as output.  This time must start before you begin computing minima and must stop after the all the results have been collected by the consumer process, but before you print your result.  You will need to use MPI_Wtime() to get the starting time.  MPI time values are displayed in seconds units, and are returned as type double including partial seconds after the decimal point.   You can get the run time by subtracting the start time from a call to MPI_Wtime() after all computations are complete and all results have been collected.  Note that you should plan to divide your work between a server process (for example, the process with rank 0) and worker/client processes that will find minima.  Output should be provided in the server process, and the calls to the MPI wall time function should only be made from the server process.  Do NOT use another function to compute the run time.

Constants and Variables

You should use the following variables.  Since you will be using processes you need to remember that these variables will NOT be shared between processes.

const int NUM_WORKER_PROCESSES;  //The number of worker processes. This should be equal to the total number of processes - 1.
const int NUM_SAMPLES_PER_PROC;  //This is the number of samples to generate per process.  It is possible to read in one value and store it in this constant.
const int NUM_ITERATIONS_PER_PROC;  //This is the number of iterations of sampling to perform per process.
const double X_MIN_BOUND; //The lower bound for x.
const double X_MAX_BOUND; //The upper bound for x.
const double Y_MIN_BOUND; //The lower bound for y.
const double Y_MAX_BOUND; //The upper bound for y.

The constants above must be available within each process.  Even though these are constants, if they are not initialized, their values may be set once (for example, by using an command line input value).  You will need variables within each worker (producer) process to track the best (minimum) x-y location and function value for the Engvall function (also called an objective function).  An example of such variables is provided below.

double x_location;
double y_location;
double function_value;

Computations (Client/Worker Processes)

You must implement both a server process (a consumer process with rank 0) and worker processes (all other ranks, which will produce results) in your program.  Each of the worker processes must compute  NUM_SAMPLES_PER_PROC results of the objective function over NUM_ITERATIONS_PER_PROC iterations, sending the results of the current best location found and current best function value to the server process after each iteration.

Main Function and Server Process

All processes of this program must read in the input data as explained above, and perform error checking on the data.  The server process must print the output of the numerical optimization.  Note that the input data must be provided at the command line.  If the input values are invalid as described above, you are to print an error message and exit the program immediately.  Otherwise, allocate memory for data structures, as necessary, in every process including the server process.  Note that there is no need to use a linked list in this program.  Rather, you may use one or more arrays of appropriate size in which you may receive data in a similar fashion as shown in the MPI producer-consumer example on the class webpage.  Doing so will make it much easier for you to determine a minimum out of the data sent.  Assuming valid input, the program will automatically start the requested number of processes using MPI.  The server process must attempt to receive all the data from the worker processes and store it in an array of appropriate size.  It may be necessary to receive this data using a construct such as MPI_Waitsome.  You must perform non-blocking receives when using this constuct.  After calling all of the non-blocking receives, you should call the MPI_Waitsome function to wait for the data.  After iteratively receiving all of the data, print the location of the minimum, the value of the minimun, and the run time as output.

Compilation

Recall that your program must read multiple inputs from the command line.  Failure to do so will result in a loss of points on this programming assignment.  Save the source code for the MPI portion of program 3 in a file named prog3Procs.c.  Do NOT give the file a different name.  You will lose points if you give the file any other name than prog3Procs.c.

To compile your program use the following command:

    mpicc prog3Procs.c -O3 -o prog3Procs.exe

This will compile your program using the mpicc compiler, and it will create an executable file named prog3Procs.exe.  Assuming that there are no errors in your program, you may run it using the command sbatch prog3.sbatch, where a sample batch file follows below:

#!/bin/bash
#SBATCH -A TG-SEE120004
#SBATCH -n 128 -N 8
#SBATCH -J prog3Procs
#SBATCH -o prog3Procs.o%j
#SBATCH -p normal
#SBATCH -t 00:15:00
ibrun prog3Procs.exe #add your inputs here

You must provide your output file for running this program with 100,000,000 samples per process for 50 iterations per process on 8 nodes distributed across 128 processes.  It is highly recommended that you try this program on LittleFe first and then attempt to scale up to using Stampede.

Submitting Your Program:

Note that you must provide informative comments for your code and follow the program guidelines.

To turn in your programs, zip your .c files, your Stampede output files, and your batch script and submit them to the eCompanion dropbox for Program 3.