Lab 5 (5 points)
        CS550, Operating Systems
        More on Parallel Programming
Name: _____________________________________________
To submit this assignment, you may copy and paste the assignment into a text editor such as nano, vi, notepad, MS Word, OpenOffice Writer, etc. Zip the code and scripts showing the output of your solutions, and submit the zip file to the dropbox for lab 5. The purpose of this lesson is to learn to about load balancing and parallel programming using Threads and MPI Message Passing (also known as Interprocess Communication) in the C programming language.
1. Download the code at this link.  This code
      computes PI using a Monte Carlo technique.  PI is computed by
      finding the proportion of the number of samples that fall within a
      circle that is circumscribed within a square to the number of
      total samples.  Convert the file circle.c to evenly divide
      the work between any number of pThreads.  You may read in the
      number of samples of threads using scanf.  You may assume
      that the number of samples can be evenly divided by the number of
      threads.  Turn in a copy of your code.  You can compile
      the code from the zip file as follows:
      
      gcc mt19937-64.c circle.c -o circle.exe
    
2. Save your code from problem 1.  Make a copy of it. 
      Convert this code to work with any number of processes in C/MPI
      using non-blocking sending and receiving to complete your
      work.  One process should act as the server and should
      receive and accumulate all the data from worker processes that
      will perform computations.  You should read in the number of
      samples from the command line in the same manner as you did in
      Project 2.  You may assume that the number of samples can be
      evenly divided by the number of processes. Turn in a copy of your
      code.  After writing your code, you can compile it as
      follows:
      
      mpicc mt19937-64.c circle.c -o circle.exe
      
      3. Write a batch script for your code and run it on
      Stampede.  Turn in a copy of your batch script and your
      output file.  Try using a large sample size and a high core
      count like 12,700,000,000 samples and 128 cores (remember that 1
      core will be used for the server process).