Lab 5 (5 points)
CS550, Operating Systems
More on Parallel Programming

Name: _____________________________________________

To submit this assignment, you may copy and paste the assignment into a text editor such as nano, vi, notepad, MS Word, OpenOffice Writer, etc.  Zip the code and scripts showing the output of your solutions, and submit the zip file to the dropbox for lab 5.  The purpose of this lesson is to learn to about load balancing and parallel programming using Threads and MPI Message Passing (also known as Interprocess Communication) in the C programming language.

1. Download the code at this link.  This code computes PI using a Monte Carlo technique.  PI is computed by finding the proportion of the number of samples that fall within a circle that is circumscribed within a square to the number of total samples.  Convert the file circle.c to evenly divide the work between any number of pThreads.  You may read in the number of samples of threads using scanf.  You may assume that the number of samples can be evenly divided by the number of threads.  Turn in a copy of your code.  You can compile the code from the zip file as follows:

gcc mt19937-64.c circle.c -o circle.exe

2. Save your code from problem 1.  Make a copy of it.  Convert this code to work with any number of processes in C/MPI using non-blocking sending and receiving to complete your work.  One process should act as the server and should receive and accumulate all the data from worker processes that will perform computations.  You should read in the number of samples from the command line in the same manner as you did in Project 2.  You may assume that the number of samples can be evenly divided by the number of processes. Turn in a copy of your code.  After writing your code, you can compile it as follows:

mpicc mt19937-64.c circle.c -o circle.exe

3. Write a batch script for your code and run it on Stampede.  Turn in a copy of your batch script and your output file.  Try using a large sample size and a high core count like 12,700,000,000 samples and 128 cores (remember that 1 core will be used for the server process).

4. Rewrite the threaded version of the pi program to use the numerical integration example of PI at the following website http://www.appentra.com/parallel-computation-pi/. Run this new version of the program. Does it run faster? Is it more accurate?