Program 2
cs550 Operating Systems
Due date: Tuesday Sept. 30, 2014 at 11:59:59 p.m.
15 points
Objective: To learn about processes, threads, and parallel decomposition.
Specification:
For Program 2, you are to write two programs that will perform
a blurring operation on a matrix. The first program must be written in
standard C and must use only pThreads as its parallel construct,
and the second must be written in C using MPI and must use only
MPI-style processes as its parallel construct. The details
of image blurring are quite simple and this program will keep them in an even simpler format: you may assume you have
one square (N by N) matrix called A with generated data or data provided as input (0 to 255) and one square (3 by 3) matrix B that contains values of
1/9 in every cell. You will apply matrix B to every element of matrix A to obtain a "blurred" image in a third matrix (N by N) C.
For this program, you may assume that the number of rows or columns (both are equal to N) will be divisible
by the number of threads or processes available. This means
that N % NUM_THREADS = 0 and N % NUM_PROCESSES = 0. In your
programs, you are to divide blurring operations evenly between processes and
threads, respectively. This means each thread or process
must perform the amount of work, which is the same number of
blurring operations. Note that a blurring operation is accomplished for one location (i,j) in matrix A by performing point to point multiplications of
the contents of matrix B to A(i,j) and all of its 8 surrounding points (i-1,j-1), (i-1,j), (i-1,j+1), (i,j-1), (i,j+1), (i+1,j-1),
(i+1,j), and (i+1,j+1). Be careful when working with the borders of matrix A, you may assume any locations that do not exist are equal to zero.
Threaded Program (7.5 points)
Inputs
For the threaded program, you are required to provide as input
the size of the matrices A and C (this will only be one value)
and the number of threads to run. Read in this data from the
console (stdin) on one line
providing the matrix size first and the number of threads
second. Information on the data to be stored within the
matrices A and C is provided later in this document. For the
purposes of computation, you may assume that the matrix will be of
at least size 4 by 4 and that the number of threads will be
greater than or equal to one.
Outputs
Upon valid input, your program should print out the first two
rows of the matrix C and the last two rows of the matrix C.
This output should be neat. That means you should include
return characters between multiple lines of output and denote
which row you are providing as output. Upon invalid input,
your main thread must print an error message. For example,
you must indicate that there is invalid input if the matrix size
is less than 4 by 4, if the number of threads requested is less
than 1, and if the number of rows of matrix C to be computed by
each thread is a non-integer value (i.e. N % NUM_THREADS !=
0). Additionally, you must provide the run time of your
program in seconds as output. This time must start before
you allocate the memory for your matrices and must stop after the
matrix blurring is complete, but before you print your
result. You will need to #include
<time.h> and use clock()
to get the starting time. Clock time values are displayed in
clock units, and are stored in primitive variables of type clock_t. You can get the run
time by subtracting the start time from a call to clock() after all computations are
complete. This clock time can be converted to seconds by
dividing by CLOCKS_PER_SEC, a
constant in time.h. Do NOT
use another function to compute the run time.
Shared Data
You may use the following "global" variables. These
variables will be shared between the matrix blurring threads and
the main thread.
static const int NUM_THREADS;
//The number of threads.
static const int N; //This is the size of one of the sides
of the matrix. It should be possible to read in one value and
store it in N. Once set this value cannot be changed.
#define ninth (1.0/9.0)
static double ** A; //The matrix A (dynamically
allocated).
static double B[3][3] = {{ninth, ninth, ninth}, {ninth, ninth, ninth}, {ninth, ninth, ninth}}; //The blurring filter.
static double ** C; //The result of A blurred by B is stored here. This is dynamically allocated.
matrix C.
Matrices A, B, and C may be shared between the main thread and
the addition threads. These matrices will be represented as
2D arrays, but A and C will be dynamically allocated.
Therefore, these must be of type double
**. The values for the number of threads and the
matrix size will be provided as input. Even though these are
constants, if they are not initialized, you should be able to set their values exactly once (for example, by using an input value).
Computations
You must implement the following function in your
program. You may use additional helper functions if
you so desire; however, you must use at minimum the function
listed below.
void * matrix_blur_operation(void *)
- this thread function must take only the thread id number as a
parameter and use it to compute N/NUM_THREADS
rows of the matrix C.
Main Function
The main function of this program must read in the input data as
explained above, perform error checking on the data, and print the
output (the first and last two rows) of the matrix
blurring. Note that the input data must be provided at the
console and will be two values on one line -- the matrix size and
the number of threads. If these values are invalid as
described above, you are to print an error message and exit the
program immediately. Otherwise, allocate memory for the
matrices A and C. Fill the each element of the matrices
A with the current row number. That is, if you are
working with A[1][2], its value
should be 1, and if you are working with A[4][3],
its value should be 4. This way it will be quite easy to
tell if your matrix addition is working properly. Assuming
valid input and that the matrices A, B, and C have been created,
start the appropriate number of threads from main using pthread_create within a loop. Be
sure to perform error checking to ensure these threads have
started. After all threads have started, have the main
thread wait for each worker thread (those that are performing
addition) to complete. This is done by calling the pthread_join function on every worker
thread. Finally, print the results as described in the
output section.
Compilation
Your program must read exactly one line of input from the keyboard (or a redirected input file). Failure to do so will result in a loss of points on this programming assignment. Save the source code for the threaded portion of program 2 in a file named prog2Threads.c. Do NOT give the file a different name. You will lose points if you give the file any other name than prog2Threads.c.
To compile your program use the following command:
gcc prog2Threads.c -pthread -o prog2Threads.exe
This will compile your program using the gcc compiler, and it will create an executable file named prog2Threads.exe. Assuming that there are no errors in your program, you may run it using the command ./prog2Threads.exe.
To run this program, it is suggested that you use the UNIX redirection commands to provide input and obtain output from your program. Run your program using the following command:
prog2Threads < p2t.in > p2t.out
To obtain input from the file p2t.in,
and to send the output of your program to the file named p2t.out. You may give these
input
and output files any name that you wish since you will not be
submitting them. The names provided are only examples.
MPI Program (7.5 points)
Inputs
For the MPI program, you are required to provide as command
line input the size of the matrices A and C (this will
only be one value). This input data will be provided on the
same line as the ibrun prog2Procs.exe
in the batch script. Note that the value will be provided
immediately after the .exe file. Command line inputs are
stored as parameters to the main function. Main always has
at least one command line input -- the name of the program.
This ensures the integer argc is
at least one and that argv[0] is
a string containing the name of the program. Since you will
use only one command line input, argc
will be 2 and argv[1] will be a
string containing the matrix size. This will need to be
converted to an integer and stored as the value N. The
number of processes and nodes will be provided using the -n and -N
options. The -n option
allows you to specify the total number of processes to run, and
the -N option allows you to
specify the number of nodes to use. On Stampede, you are
limited to use at most 16 processes per node (equal to the number
of CPU cores available). Using more computational processes
than there are CPU cores is called oversubscribing and is not
usually permitted by the operating system on High Performance
Computers. Information on the data to be stored within the
matrices A and C is provided later in this document. For the
purposes of computation, you may assume that the matrix will be of
at least size 10 by 10 and that the number of processes will be
greater than or equal to two.
Outputs
Upon valid input, your program should print out the first two
rows of the matrix C and the last two rows of the matrix C.
This output should be neat. That means you should include
return characters between multiple lines of output and denote
which row you are providing as output. Upon invalid input,
your main function must print an error message. For example,
you must indicate that there is invalid input if the matrix size
is less than 10 by 10, if the number of processes requested is
less than two, and if the number of rows of matrix C to be
computed by each worker process is a non-integer value (i.e. N % (NUM_PROCESSES-1)
!= 0). Additionally, you must provide the run time of your
program in seconds as output. This time must start before
you allocate the memory for your matrices and must stop after the
matrix addition is complete, but before you print your
result. You will need to use MPI_Wtime()
to get the starting time. MPI time values are displayed in
seconds units, and are returned as type double including partial
seconds after the decimal point. You can get the run time
by subtracting the start time from a call to MPI_Wtime() after all computations are
complete. Note that you should plan to divide your work
between a server process (for example, the process with rank 0)
and worker/client processes that will perform matrix
computations. Output should be provided in the server
process, and the calls to the MPI wall time function should only
be made from the server process. Do NOT use another function
to compute the run time.
Matrix Data
You should use the following variables. Since you will be
using processes you need to remember that these variables will NOT
be shared between processes.
const int NUM_THREADS; //The
number of threads.
const int N; //This is the size of one of the sides of the
matrix. It is possible to read in one value and store it
in N. Once set this value cannot be changed.
#define ninth (1.0/9.0)
double ** A; //The matrix A (dynamically allocated).
static double B[3][3] = {{ninth, ninth, ninth}, {ninth, ninth, ninth}, {ninth, ninth, ninth}}; //The blurring filter.
double ** C; //The result of blurring A with B is stored in matrix C.
Matrix A may be allocated and initialized in every
process. You should only need to allocate matrix C in the server process. These matrices will be represented as 2D arrays and
will be dynamically allocated. Therefore, these must be of
type double **. The values
for the number of processes and the matrix size will be provided as
input. Even though these are constants, if they are not
initialized, their values may be set once (for example, by using
an command line input value).
Computations (Client/Worker Processes)
You must implement both a server process (rank 0) and
client/worker processes (all other ranks) in your program.
Each of the worker processes must compute the N/(NUM_PROCESSES-1) rows of the matrix
C, and send the results to the server process.
Main Function and Server Process
All processes of this program must read in the input data as
explained above, and perform error checking on the data. The
server process must print the output (the first and last five
rows) of the matrix addition. Note that the input data must
be provided at the command line. If the input values are
invalid as described above, you are to print an error message and
exit the program immediately. Otherwise, allocate memory for
the matrix A in every process except the server
process. Every worker process should allocate and fill each
element of the matrix A with the current row number.
That is, if you are working with A[1][2],
its value should be 1, and if you are working with B[4][3], its value should be 4.
This way it will be quite easy to tell if your matrix blurring is
working properly. Assuming valid input and that the matrices
A, B, and C have been created, the program will automatically
start the requested number of processes using MPI. The
server process must attempt to receive all the data from the
worker processes and store it in matrix C. It is recommended
that you perform non-blocking receives so you gain experience with
the constuct. After calling all of the non-blocking
receives, you should call the MPI_Waitall function to wait for the
data. Finally, print the data and the run time as output.
Compilation
Your program must read one input from the command line. Failure to do so will result in a loss of points on this programming assignment. Save the source code for the MPI portion of program 2 in a file named prog2Procs.c. Do NOT give the file a different name. You will lose points if you give the file any other name than prog2Threads.c.
To compile your program use the following command:
mpicc prog2Procs.c -O3 -o prog2Procs.exe
This will compile your program using the mpicc compiler, and it will create an executable
file
named prog2Procs.exe.
Assuming that
there are no errors in your program, you may run it using the
command sbatch prog2.sbatch,
where a sample batch file follows below:
#!/bin/bash
#SBATCH -A TG-SEE120004
#SBATCH -n 16 -N 1
#SBATCH -J prog2Procs
#SBATCH -o prog2Procs.o%j
#SBATCH -p normal
#SBATCH -t 00:15:00
ibrun prog2Procs.exe 1500
You must provide your output file and batch script for running this program on a
matrix of size 6300 by 6300 on 4 nodes distributed across 64
processes. It is highly recommended that you try this
program on LittleFe first and then attempt to scale up to using
Stampede.
Submitting Your Program:
Note that you must provide informative comments for your code and follow the program guidelines.
To turn in your programs, zip your .c files, your Stampede output files, and your batch script and submit them to the eCompanion dropbox for Program 2.