Program 2
        cs550 Operating Systems
        Due date:  Tuesday Sept. 30, 2014 at 11:59:59 p.m.
        15 points  
Objective: To learn about processes, threads, and parallel decomposition.
Specification:
For Program 2, you are to write two programs that will perform
      a blurring operation on a matrix.  The first program must be written in
      standard C and must use only pThreads as its parallel construct,
      and the second must be written in C using MPI and must use only
      MPI-style processes as its parallel construct.  The details
      of image blurring are quite simple and this program will keep them in an even simpler format: you may assume you have
      one square (N by N) matrix called A with generated data or data provided as input (0 to 255) and one square (3 by 3) matrix B that contains values of
      1/9 in every cell.  You will apply matrix B to every element of matrix A to obtain a "blurred" image in a third matrix (N by N) C.  
      For this program, you may assume that the number of rows or columns (both are equal to N) will be divisible
      by the number of threads or processes available.  This means
      that N % NUM_THREADS = 0 and N % NUM_PROCESSES = 0.  In your
      programs, you are to divide blurring operations evenly between processes and
      threads, respectively.  This means each thread or process
      must perform the amount of work, which is the same number of
      blurring operations.  Note that a blurring operation is accomplished for one location (i,j) in matrix A by performing point to point multiplications of 
      the contents of matrix B to A(i,j) and all of its 8 surrounding points (i-1,j-1), (i-1,j), (i-1,j+1), (i,j-1), (i,j+1), (i+1,j-1), 
      (i+1,j), and (i+1,j+1).  Be careful when working with the borders of matrix A, you may assume any locations that do not exist are equal to zero. 
      
      Threaded Program (7.5 points)
    
Inputs
For the threaded program, you are required to provide as input
      the size of the matrices A and C (this will only be one value)
      and the number of threads to run.  Read in this data from the
      console (stdin) on one line
      providing the matrix size first and the number of threads
      second.  Information on the data to be stored within the
      matrices A and C is provided later in this document.  For the
      purposes of computation, you may assume that the matrix will be of
      at least size 4 by 4 and that the number of threads will be
      greater than or equal to one.
    
Outputs
    
Upon valid input, your program should print out the first two
      rows of the matrix C and the last two rows of the matrix C. 
      This output should be neat.  That means you should include
      return characters between multiple lines of output and denote
      which row you are providing as output.  Upon invalid input,
      your main thread must print an error message.  For example,
      you must indicate that there is invalid input if the matrix size
      is less than 4 by 4, if the number of threads requested is less
      than 1, and if the number of rows of matrix C to be computed by
      each thread is a non-integer value (i.e. N % NUM_THREADS !=
      0).  Additionally, you must provide the run time of your
      program in seconds as output.  This time must start before
      you allocate the memory for your matrices and must stop after the
      matrix blurring is complete, but before you print your
      result.  You will need to #include
        <time.h> and use clock()
      to get the starting time.  Clock time values are displayed in
      clock units, and are stored in primitive variables of type clock_t.   You can get the run
      time by subtracting the start time from a call to clock() after all computations are
      complete.  This clock time can be converted to seconds by
      dividing by CLOCKS_PER_SEC, a
      constant in time.h.  Do NOT
      use another function to compute the run time.
    
Shared Data
    
      You may use the following "global" variables.  These
      variables will be shared between the matrix blurring threads and
      the main thread.
    
static const int NUM_THREADS; 
        //The number of threads.
        static const int N;  //This is the size of one of the sides
        of the matrix.  It should be possible to read in one value and
        store it in N.  Once set this value cannot be changed.
        #define ninth (1.0/9.0)
        static double ** A;  //The matrix A (dynamically
        allocated).
        static double B[3][3] = {{ninth, ninth, ninth}, {ninth, ninth, ninth}, {ninth, ninth, ninth}};  //The blurring filter.
        static double ** C;  //The result of A blurred by B is stored here.  This is dynamically allocated.
        matrix C.
    
Matrices A, B, and C may be shared between the main thread and
      the addition threads.  These matrices will be represented as
      2D arrays, but A and C will be dynamically allocated. 
      Therefore, these must be of type double
        **.  The values for the number of threads and the
      matrix size will be provided as input.  Even though these are
      constants, if they are not initialized, you should be able to set their values exactly once (for example, by using an input value).
    
Computations
    
You must implement the following function in your
      program.   You may use additional helper functions if
      you so desire; however, you must use at minimum the function
      listed below.
      
      void * matrix_blur_operation(void *)
      - this thread function must take only the thread id number as a
      parameter and use it to compute N/NUM_THREADS
      rows of the matrix C.
      
      Main Function
    
The main function of this program must read in the input data as
      explained above, perform error checking on the data, and print the
      output (the first and last two rows) of the matrix
      blurring.  Note that the input data must be provided at the
      console and will be two values on one line -- the matrix size and
      the number of threads.  If these values are invalid as
      described above, you are to print an error message and exit the
      program immediately.  Otherwise, allocate memory for the
      matrices A and C.  Fill the each element of the matrices
      A with the current row number.  That is, if you are
      working with A[1][2], its value
      should be 1, and if you are working with A[4][3],
      its value should be 4.  This way it will be quite easy to
      tell if your matrix addition is working properly.  Assuming
      valid input and that the matrices A, B, and C have been created,
      start the appropriate number of threads from main using pthread_create within a loop.  Be
      sure to perform error checking to ensure these threads have
      started.  After all threads have started, have the main
      thread wait for each worker thread (those that are performing
      addition) to complete.  This is done by calling the pthread_join function on every worker
      thread.  Finally, print the results as described in the
      output section.
      
Compilation
    
Your program must read exactly one line of input from the keyboard (or a redirected input file). Failure to do so will result in a loss of points on this programming assignment. Save the source code for the threaded portion of program 2 in a file named prog2Threads.c. Do NOT give the file a different name. You will lose points if you give the file any other name than prog2Threads.c.
To compile your program use the following command:
gcc prog2Threads.c -pthread -o prog2Threads.exe
This will compile your program using the gcc compiler, and it will create an executable file named prog2Threads.exe. Assuming that there are no errors in your program, you may run it using the command ./prog2Threads.exe.
To run this program, it is suggested that you use the UNIX redirection commands to provide input and obtain output from your program. Run your program using the following command:
prog2Threads < p2t.in > p2t.out
To obtain input from the file p2t.in,
      and to send the output of your program to the file named p2t.out.  You may give these
      input
      and output files any name that you wish since you will not be
      submitting them.  The names provided are only examples.
    
MPI Program (7.5 points)
    
Inputs
For the MPI program, you are required to provide as command
        line input the size of the matrices A and C (this will
      only be one value).  This input data will be provided on the
      same line as the ibrun prog2Procs.exe
      in the batch script.  Note that the value will be provided
      immediately after the .exe file.  Command line inputs are
      stored as parameters to the main function.  Main always has
      at least one command line input -- the name of the program. 
      This ensures the integer argc is
      at least one and that argv[0] is
      a string containing the name of the program.  Since you will
      use only one command line input, argc
      will be 2 and argv[1] will be a
      string containing the matrix size.  This will need to be
      converted to an integer and stored as the value N.  The
      number of processes and nodes will be provided using the -n and -N
      options.  The -n option
      allows you to specify the total number of processes to run, and
      the -N option allows you to
      specify the number of nodes to use.  On Stampede, you are
      limited to use at most 16 processes per node (equal to the number
      of CPU cores available).  Using more computational processes
      than there are CPU cores is called oversubscribing and is not
      usually permitted by the operating system on High Performance
      Computers.  Information on the data to be stored within the
      matrices A and C is provided later in this document.  For the
      purposes of computation, you may assume that the matrix will be of
      at least size 10 by 10 and that the number of processes will be
      greater than or equal to two.
    
Outputs
    
Upon valid input, your program should print out the first two
      rows of the matrix C and the last two rows of the matrix C. 
      This output should be neat.  That means you should include
      return characters between multiple lines of output and denote
      which row you are providing as output.  Upon invalid input,
      your main function must print an error message.  For example,
      you must indicate that there is invalid input if the matrix size
      is less than 10 by 10, if the number of processes requested is
      less than two, and if the number of rows of matrix C to be
      computed by each worker process is a non-integer value (i.e. N % (NUM_PROCESSES-1)
      != 0).  Additionally, you must provide the run time of your
      program in seconds as output.  This time must start before
      you allocate the memory for your matrices and must stop after the
      matrix addition is complete, but before you print your
      result.  You will need to use MPI_Wtime()
      to get the starting time.  MPI time values are displayed in
      seconds units, and are returned as type double including partial
      seconds after the decimal point.   You can get the run time
      by subtracting the start time from a call to MPI_Wtime() after all computations are
      complete.  Note that you should plan to divide your work
      between a server process (for example, the process with rank 0)
      and worker/client processes that will perform matrix
      computations.  Output should be provided in the server
      process, and the calls to the MPI wall time function should only
      be made from the server process.  Do NOT use another function
      to compute the run time.
    
Matrix Data
    
      You should use the following variables.  Since you will be
      using processes you need to remember that these variables will NOT
      be shared between processes.
    
const int NUM_THREADS;  //The
        number of threads.
        const int N;  //This is the size of one of the sides of the
        matrix.  It is possible to read in one value and store it
        in N.  Once set this value cannot be changed.
        #define ninth (1.0/9.0)
        double ** A;  //The matrix A (dynamically allocated).
        static double B[3][3] = {{ninth, ninth, ninth}, {ninth, ninth, ninth}, {ninth, ninth, ninth}};  //The blurring filter.
        double ** C;  //The result of blurring A with B is stored in matrix C.
    
Matrix A may be allocated and initialized in every
      process.  You should only need to allocate matrix C in the server process.  These matrices will be represented as 2D arrays and
      will be dynamically allocated.  Therefore, these must be of
      type double **.  The values
      for the number of processes and the matrix size will be provided as
      input.  Even though these are constants, if they are not
      initialized, their values may be set once (for example, by using
      an command line input value).
    
Computations (Client/Worker Processes)
    
You must implement both a server process (rank 0) and
      client/worker processes (all other ranks) in your program. 
      Each of the worker processes must compute the  N/(NUM_PROCESSES-1) rows of the matrix
      C, and send the results to the server process.
      
      Main Function and Server Process
    
All processes of this program must read in the input data as
      explained above, and perform error checking on the data.  The
      server process must print the output (the first and last five
      rows) of the matrix addition.  Note that the input data must
      be provided at the command line.  If the input values are
      invalid as described above, you are to print an error message and
      exit the program immediately.  Otherwise, allocate memory for
      the matrix A in every process except the server
      process.  Every worker process should allocate and fill each
      element of the matrix A with the current row number. 
      That is, if you are working with A[1][2],
      its value should be 1, and if you are working with B[4][3], its value should be 4. 
      This way it will be quite easy to tell if your matrix blurring is
      working properly.  Assuming valid input and that the matrices
      A, B, and C have been created, the program will automatically
      start the requested number of processes using MPI.  The
      server process must attempt to receive all the data from the
      worker processes and store it in matrix C.  It is recommended
      that you perform non-blocking receives so you gain experience with
      the constuct.  After calling all of the non-blocking
      receives, you should call the MPI_Waitall function to wait for the
      data.  Finally, print the data and the run time as output.
      
Compilation
    
Your program must read one input from the command line. Failure to do so will result in a loss of points on this programming assignment. Save the source code for the MPI portion of program 2 in a file named prog2Procs.c. Do NOT give the file a different name. You will lose points if you give the file any other name than prog2Threads.c.
To compile your program use the following command:
mpicc prog2Procs.c -O3 -o prog2Procs.exe
This will compile your program using the mpicc compiler, and it will create an executable
      file
      named prog2Procs.exe. 
      Assuming that
      there are no errors in your program, you may run it using the
      command sbatch prog2.sbatch,
      where a sample batch file follows below:
    
#!/bin/bash
        #SBATCH -A TG-SEE120004
        #SBATCH -n 16 -N 1
        #SBATCH -J prog2Procs
        #SBATCH -o prog2Procs.o%j
        #SBATCH -p normal
        #SBATCH -t 00:15:00
        ibrun prog2Procs.exe 1500
    
You must provide your output file and batch script for running this program on a
      matrix of size 6300 by 6300 on 4 nodes distributed across 64
      processes.  It is highly recommended that you try this
      program on LittleFe first and then attempt to scale up to using
      Stampede.
    
Submitting Your Program:
Note that you must provide informative comments for your code and follow the program guidelines.
To turn in your programs, zip your .c files, your Stampede output files, and your batch script and submit them to the eCompanion dropbox for Program 2.