Project 1
cs599 Parallel Programming and Algorithms
20 Points
Due Feb. 12, 2015 at Midnight
Based on an example provided within the LLNL Introduction to Parallel Computing webpage
Available at: https://computing.llnl.gov/tutorials/parallel_comp/
Description
For project 1, you are to perform a simulation of the heat equation for a 2D surface. You will implement this program in three different ways, and you will provide as input to this program a 2D array of values describing the temperature distribution on a surface. In this program you will use the heat equation to compute the heat dissipation over a number of discrete time steps given this 2D array of temperatures. Heat dissipates to the nearest neighbors of a particular cell as shown below.
+---------+
| |
|U[i-1][j]|
| |
+---------+---------+---------+
| | | |
|U[i][j-1]| U[i][j] |U[i][j+1]|
| | | |
+---------+---------+---------+
| |
|U[i+1][j]|
| |
+---------+
The neighboring points above are used to calculate the resulting temperature at U[i][j] after one timestep using the formula below.
U[i][j] = U[i][j] + C_x * (U[i+1][j] + U[i-1][j] - 2*U[i][j]) +
C_y * (U[i][j+1] + U[i][j-1] - 2*U[i][j])
Assume C_x = C_y = 0.1
Note that when performing this computation at time t, the result, U[i][j], must be stored in a separate array representing time t+1. Furthermore, results must be computed assuming that the edges of the 2D array are zero-padded.
Implementation
Version A - Serial
Your program must take as input the size of the array and number of time steps as described in the input section below. You must also take as input a 2D array describing the initial state of the surface. This array must be allocated dynamically based upon the input sizes provided. You must also allocate a second array of the same size to act as a buffer. You may assume that the sizes will be correct.
Next, perform the computations for the heat equation, storing the results in the buffer array. Swap the arrays and repeat for the number of time steps.
Finally, print the output using printf statements, printing one row of data per line, with values separated by whitespace.
Version B - Simple Parallel Version
Parallelize Version A using exactly one OpenMP parallel for statement.
Version C - Tiled Parallel Version
Make a copy of your Version A code and modify it to perform computations in tiles. You may assume a fixed tile size of 32 by 32 in your program.
Input
You may assume that your program will receive as input on one line: the number of rows in the initial array, N, the number of columns, M, and the number of time steps, T.
Input will be formatted as follows:
N M T
array data, one row per line follows
At the following link, you will find a sample input file and a sample C program to generate similar input files.
Output
Provide as output the wall time and CPU time required by your program and the resulting array with one row per line.
Note that your input and output data will be quite large, therefore, you should use redirection to run your program as follows:
./heatEq.exe < heatInputData > heatOutputData
The statement above will run the executable file heatEq.exe, providing as input from stdin the contents of heatInputData, and storing the results of the executable file in heatOutputData.
A sample output file is provided at the following link
Comparison
For comparison purposes, run each program on one node of the Stampede supercomputer using a 1024 by 768 array of temperatures for 4000 time steps. Write a paragraph of at least five sentences describing your results, being sure to include the run times in your paragraph.
Submission
Submit your three programs, a README file describing how to compile and run your programs, and your comparison to the eCompanion dropbox for Project 1.