Program 3
cs550 Operating Systems
Due date: Monday Oct. 27, 2014 at 11:59:59 p.m.
20 points
Objective: To learn about the producer-consumer problem.
Specification:
For Program 3, you are to write two programs that will perform
numerical optimization -- finding the minimum of a mathematical
function. The first program must be written in standard C
and must use only pthreads as its parallel construct, and the
second must be written in C using MPI and must use only MPI-style
processes as its parallel construct. Performing numerical
optimization by using calculus is beyond the scope of this course,
so we will use a Monte Carlo method.
Finding a minimum is actually really easy. For example if
you are given a function f(x) = x^2, where -10 <= x <= 10,
we know from calculus or from direct observation that the minimum
is zero. Try graphing it! To find a value very close
to the minimum with a program using a Monte Carlo (random
sampling) method is simple. Take a large number of random
samples (maybe 1,000,000) for x between -10 <= x <= 10,
evaluate the function x^2 at every point, and find the value of x
that yields the smallest number for x^2.
For this assignment you will be given a function that is more
substantial than x^2 to work with, but the idea is the same.
Find the minimum of the function. First, do so using
threads. Then, do so using processes. Divide the
sampling evenly between the threads/processes. Your programs
will take as input the bounds of the function, the number of
samples, and the number of threads or processes to use. Each
process or thread must perform an approximately equal amount of
work and store its minimum locally. After computing this
value, provide the result to a consumer process/thread that
determines which result is the actual minimum.
Note that it is highly recommended that you use the Mersenne
Twister pseudorandom number generator from Lab 5 at this link as part
of the program.
For this program, you will use Engvall's Function
(note that you may "hard code" this function into your program):
f(x,y) = x^4 + y^4 + 2*(x^2)*(y^2) - 4*x + 3
The actual minimum for this function is:
f(x,y) = 0
And is located at one point:
x = 1
y = 0
The bounds for this function should be no smaller than:
-2000 <= x <= 2000
-2000 <= y <= 2000
Reference: http://web.ift.uib.no/~antonych/RGA4.html
Threaded Program (10 points)
Inputs
For the threaded program, you are required to provide as input the minimum and maximum bounds for x and y, the number of samples per iteration, the number of iterations per producer thread, and the number of producer threads to run. Read in this data from the console (stdin) on one line providing the minimum bound for x first, the maximum bound for x second, the minimum bound for y third, the maximum bound for y fourth, the number of samples per iteration fifth, the number of iterations sixth, and the number of threads seventh. Information on how to use the bounds, samples, and number of iterations is provided later in this document. For the purposes of computation, the number of samples must be greater than zero, as must the number of iterations, and the number of worker (producer) threads must be greater than or equal to one.
Outputs
Upon valid input, your program should print out the best result
obtained from any of the threads over all of the iterations.
You must also print out the (x,y) location of this best
result. This output should be neat. Your program must
also include debugging output for each thread. This means
you should print a line of output showing the location and value
of the minimum as determined by each thread. Additionally,
upon invalid input, your main thread must print an error
message. For example, you must indicate that there is an
invalid input if the bounds are invalid, the sample size is less
than one, the number of iterations is less than one, or the number
of threads is less than one. Additionally, you must provide
the run time of your program in seconds as output. This time
must start before you begin computing the minima and must stop
after the minima have been collected by the consumer thread, but
before you print your result. You will need to #include <time.h> and use clock() to get the starting
time. Clock time values are displayed in clock units, and
are stored in primitive variables of type clock_t. You can get the run time by
subtracting the start time from a call to clock() after all computations are complete.
This clock time can be converted to seconds by dividing by CLOCKS_PER_SEC, a constant in time.h. Do NOT use another
function to compute the run time.
Shared Data
You may use the following "global" variables. These
variables will be shared between producer threads that will
perform optimization and the consumer thread that will find the
best value.
static const int
NUM_WORKER_THREADS; //The number of worker threads.
static const int NUM_SAMPLES_PER_THREAD; //This is the
number of samples to generate per thread. It is possible
to read in one value and store it in this constant.
static
const int NUM_ITERATIONS_PER_THREAD; //This is the
number of iterations of sampling to perform per thread.
static const double X_MIN_BOUND; //The lower bound for x.
static const double X_MAX_BOUND; //The upper bound for x.
static const double Y_MIN_BOUND; //The lower bound for y.
static const double Y_MAX_BOUND; //The upper bound for y.
//Once set this value cannot be changed.
static node * head; //The memory address of the head node
of the linked list that will contain the minima found by each
worker (producer) thread.
static sem_t full; //The semaphore to track the number of
full items in the list.
static sem_t empty; //The semaphore to track the number of empty
slots in the list.
static sem_t mutex; //The semaphore to provide mutual exclusion
on the list.
The variables above may be shared between the producer threads,
consumer thread, and main thread. The nodes within the list
should be represented by C-style structs.
These will be dynamically allocated and should include the x-y
location of the minimum (two double
variables) and the value of the minimum (a double). Therefore, this struct should be composed as follows:
struct node
{
double x_location;
double y_location;
double function_value;
struct node * next;
};
typedef struct node node;
Recall that the values for many of the constants above will be
provided as input. Even though these are constants, if they
are not initialized, their values may be set once (for example, by
using an input value). If you are having trouble
initializing these constants, it is OK to remove the const declarations.
Computations (Worker/Producer threads and Consumer thread)
You must implement the following two functions in your
program. These will represent your threads. You may
use additional helper functions if you so desire; however, you
must use at minimum the function listed below.
void * produce_min(void *) - this
function, which is utilized for each worker thread, must take only
the thread id number as a parameter, and it must compute NUM_SAMPLES_PER_THREAD results from
the Engvall function as described above. This
function should contain a for-loop that computes NUM_SAMPLES_PER_THREAD results NUM_ITERATIONS_PER_THREAD
times. With each iteration, a minimum value
and location for the Engvall function should be
determined. This minimum should be retained and improved
upon if possible with each pass through the for-loop. After
each pass, the result should be stored within a linked list (that
is, produced within) that is protected by standard
producer-consumer semaphores - full,
empty, and mutex.
void * consume_results(void *) -
this function, which is used for the consumer thread, must take
only the thread id number as a parameter, and it must determine
which of the results from the worker threads is the actual
minimum. Since this thread is a consumer, it must
continously check the linked list, remove an item, and update the
actual value and location of the global minimum. Since this
function represents a consumer thread, it must utilize the
semaphores full, empty, and mutex
to access the linked list and remove an item. This function
should print a result once it has received all of the
minima. Note that there should be NUM_WORKER_THREADS*NUM_ITERATIONS_PER_THREAD
values for minima that will be produced by the worker threads.
Main Function
The main function of this program must read in the input data as
explained above, perform error checking on the data, start the
threads, and join the threads. Note that the input data must
be provided at the console and will be seven values on one line --
the minimum bound for x first, the maximum bound for x second, the
minimum bound for y third, the maximum bound for y fourth, the
number of samples per iteration fifth, the number of iterations
sixth, and the number of threads seventh.
Compilation
Your program must read exactly one line of input from the
keyboard (or a redirected input file). Failure to do so will
result in a loss of points on this programming assignment.
Save the source code for the threaded portion of program 3 in a
file named prog3Threads.c.
Do NOT give this file a different name. You will lose points
if you give the file any other name than prog3Threads.c.
Additionally, you will need to use code from the threaded
producer-consumer example on the course webpage to complete this
portion of the project. This includes the linked_list.h and linked_list.c files and may include additional files
from the project. You must create a makefile called makefile for your project and use this
file to compile all of your files.
To compile your program use the following command:
make
This must compile your program using the gcc compiler, and it will create an executable file named prog3Threads.exe. Assuming that there are no errors in your program, you may run it using the command ./prog3Threads.exe.
To run this program, it is suggested that you use the UNIX redirection commands to provide input and obtain output from your program. Run your program using the following command:
prog3Threads < p3t.in > p3t.out
To obtain input from the file p3t.in,
and to send the output of your program to the file named p3t.out. You may give these
input and output files any name that you wish since you will not
be submitting them. The names provided are only examples.
MPI Program (10 points)
Inputs
For the MPI program, you are required to provide as command line input the minimum and maximum bounds for x and y, the number of samples per iteration, and the number of iterations per producer process. Read in the minimum bound for x first, the maximum bound for x second, the minimum bound for y third, the maximum bound for y fourth, the number of samples per iteration fifth, and the number of iterations sixth. This input data will be provided on the same line as the ibrun prog3Procs.exe in the batch script if you run the program on Stampede or the same line as mpirun if you were to run the program on LittleFe. Note that the values will be provided immediately after the .exe file. Command line inputs are stored as parameters to the main function. Main always has at least one command line input -- the name of the program. This ensures the integer argc is at least one and that argv[0] is a string containing the name of the program. Since you will use six command line inputs, argc will be 7, and, as an example, argv[1] will be a string containing the minimum bound for x. This will need to be converted to a double and stored as the value X_MIN_BOUND. The number of processes and nodes will be provided using the -n and -N options on Stampede or with the -np option on Littlefe. On Stampede, the -n option allows you to specify the total number of processes to run, and the -N option allows you to specify the number of nodes to use. On Stampede, you are limited to use at most 16 processes per node (equal to the number of CPU cores available). Using more computational processes than there are CPU cores is called oversubscribing and is not generally permitted by the operating system on High Performance Computers. Information on how to use the bounds, samples, and number of iterations is provided later in this document. For the purposes of computation, the number of samples must be greater than zero, as must the number of iterations, and the total number of processes must be greater than or equal to two.
Outputs
Upon valid input, your program should print out the best result
obtained from any of the worker processes (producers) over all of
the iterations. You must also print out the (x,y) location
of this best result. This output should be neat. Your
program must also include debugging output for each thread.
This means you should print a line of output showing the location
and value of the minimum as determined by each worker (producer)
process. Additionally, upon invalid input, your program must
print an error message. It is ok if this message shows up
for every process. For example, you must indicate that there
is an invalid input if the bounds are invalid, the sample size is
less than one, the number of iterations is less than one, or the
number of processes is less than two. Additionally, you must
provide the run time of your program in seconds as output.
This time must start before you begin computing minima and must
stop after the all the results have been collected by the consumer
process, but before you print your result. You will need to
use MPI_Wtime() to get the
starting time. MPI time values are displayed in seconds
units, and are returned as type double including partial seconds
after the decimal point. You can get the run time by
subtracting the start time from a call to MPI_Wtime() after all computations are complete and
all results have been collected. Note that you should plan
to divide your work between a server process (for example, the
process with rank 0) and worker/client processes that will find
minima. Output should be provided in the server process, and
the calls to the MPI wall time function should only be made from
the server process. Do NOT use another function to compute
the run time.
Constants and Variables
You should use the following variables. Since you will be
using processes you need to remember that these variables will NOT
be shared between processes.
const int NUM_WORKER_PROCESSES;
//The number of worker processes. This should be equal to the
total number of processes - 1.
const int NUM_SAMPLES_PER_PROC; //This is the number of
samples to generate per process. It is possible to read in
one value and store it in this constant.
const int
NUM_ITERATIONS_PER_PROC; //This is the number of
iterations of sampling to perform per process.
const double X_MIN_BOUND; //The lower bound for x.
const double X_MAX_BOUND; //The upper bound for x.
const double Y_MIN_BOUND; //The lower bound for y.
const double Y_MAX_BOUND; //The upper bound for y.
The constants above must be available within each process.
Even though these are constants, if they are not initialized,
their values may be set once (for example, by using an command
line input value). You will need variables within each
worker (producer) process to track the best (minimum) x-y location
and function value for the Engvall function (also
called an objective function). An example of such variables
is provided below.
double x_location;
double y_location;
double function_value;
Computations (Client/Worker Processes)
You must implement both a server process (a consumer process with
rank 0) and worker processes (all other ranks, which will produce
results) in your program. Each of the worker processes must
compute NUM_SAMPLES_PER_PROC
results of the objective function over NUM_ITERATIONS_PER_PROC
iterations, sending the results of the current best location found
and current best function value to the server process after each
iteration.
Main Function and Server Process
All processes of this program must read in the input data as
explained above, and perform error checking on the data. The
server process must print the output of the numerical
optimization. Note that the input data must be provided at
the command line. If the input values are invalid as
described above, you are to print an error message and exit the
program immediately. Otherwise, allocate memory for data
structures, as necessary, in every process including the server
process. Note that there is no need to use a linked list in
this program. Rather, you may use one or more arrays of
appropriate size in which you may receive data in a similar
fashion as shown in the MPI producer-consumer example on the class
webpage. Doing so will make it much easier for you to
determine a minimum out of the data sent. Assuming valid
input, the program will automatically start the requested number
of processes using MPI. The server process must attempt to
receive all the data from the worker processes and store it in an
array of appropriate size. It may be necessary to receive
this data using a construct such as MPI_Waitsome. You must
perform non-blocking receives when using this constuct.
After calling all of the non-blocking receives, you should call
the MPI_Waitsome function to wait for the data. After
iteratively receiving all of the data, print the location of the
minimum, the value of the minimun, and the run time as output.
Compilation
Recall that your program must read multiple inputs from the command line. Failure to do so will result in a loss of points on this programming assignment. Save the source code for the MPI portion of program 3 in a file named prog3Procs.c. Do NOT give the file a different name. You will lose points if you give the file any other name than prog3Procs.c.
To compile your program use the following command:
mpicc prog3Procs.c -O3 -o prog3Procs.exe
This will compile your program using the mpicc compiler, and it will create an executable
file named prog3Procs.exe.
Assuming that there are no errors in your program, you may run it
using the command sbatch prog3.sbatch,
where a sample batch file follows below:
#!/bin/bash
#SBATCH -A TG-SEE120004
#SBATCH -n 128 -N 8
#SBATCH -J prog3Procs
#SBATCH -o prog3Procs.o%j
#SBATCH -p normal
#SBATCH -t 00:15:00
ibrun prog3Procs.exe #add your inputs here
You must provide your output file for running this program with
100,000,000 samples per process for 50 iterations per process on 8
nodes distributed across 128 processes. It is highly
recommended that you try this program on LittleFe first and then
attempt to scale up to using Stampede.
Submitting Your Program:
Note that you must provide informative comments for your code and follow the program guidelines.
To turn in your programs, zip your .c files, your Stampede output files, and your batch script and submit them to the eCompanion dropbox for Program 3.