Lab 11 (5 points)
CS550, Operating Systems
Caching

Name: _____________________________________________

To submit this assignment, you may copy and paste parts of the assignment into a text editor such as nano, vi, notepad, MS Word, OpenOffice Writer, etc.  Zip any code and scripts you create showing the output of your solutions, and submit the zip file to the dropbox for lab 11.  Be sure to include a text document including any written/typed/graphed results. You may work with a partner on this lab, but each person must submit his/her own solution.

The following lab is based in part upon labs provided at the CUDA and C++ 11 sessions from the SC13 conference.  Within this lab, you will work with a matrix multiplication program and learn about the effects of data locality within the CPU cache, and how this may be indirectly affected depending upon the order in which data is accessed.

Within this lab, you will test scaling and caching by using matrix multiplication.

Download the files at the following link.

Review the batch file provided below.

#!/bin/bash
#SBATCH -A TG-SEE120004
#SBATCH -n 16
#SBATCH -J matMult
#SBATCH -o mm.o%j
#SBATCH -p development
#SBATCH -t 00:15:00
echo 'Starting job'
ibrun mmnf.exe 5000
echo 'Completed job'


1. What are the command line parameters provided in this batch file?

2. What do you think the echo command does?

Review the two C programs provided in the zip file.

3. What is the purpose of the code in each file?

4. What design pattern is used within the code?

5. What data is sent in each process?

6. What data is received in each process?

7. Are all of these data transfers necessary?  Explain.

8. Compile and run the code on Stampede using the batch scripts provided.  Record the run time of your results. 

9. Modify the number of cores used to 16, 32, 128, 256, and 512.  Record and graph these run times vs the number of processors, including the results from problem 8.

10. Explain the results.  Consider caching and data locality in your answer.  Hint: consider the difference in memory location between matrix data in the same row - A[i][j] and A[i][j+1] vs matrix data in the same column A[i+1][j] and A[i][j].  In which case would both values likely be pulled into cache?