Lab 10 (5 points)
CS550, Operating Systems
File Systems and Scheduling

Name: _____________________________________________

To submit this assignment, you may copy and paste parts of the assignment into a text editor such as nano, vi, notepad, MS Word, OpenOffice Writer, etc.  Zip any code and scripts showing the output of your solutions, and submit the zip file to the dropbox for lab 10.  Be sure to include a text document including any written/typed/graphed results. 

The following lab is based in part upon labs provided at the TACC Xeon Phi Tutorial from the XSEDE 2013 conference.  Within this lab, you will work with the Xeon Phi Accelerator and learn more about the MPI_Status structure, MPI_Waitsome function, dining philosophers in MPI, and the lfs command to view and change properties of the Lustre filesystem.  You will also learn some scripting basics.

Upload the code at this link to Stampede.  Next, compile the following file on stampede as follows:

mpicc messages3.c -mmic -O3 -o micMsg.exe

The command above compiles the file for the Intel Xeon Phi Accelerator.

mpicc messages3.c -O3 -o hostMsg.exe

The command above compiles the file for the main CPU (Sandy Bridge) on a node of Stampede.

Now run the following command:

idev -A TG-SEE120004

1. What happened?

Hint: this command retrieves an interactive environment for you from Stampede for 30 minutes.  You will have access to a node of your own for a total of 30 minutes after running idev.

Every standard node on Stampede includes a Xeon Phi accelerator.  Currently code that is to be used on these accelerators must be compiled using the Intel compiler and it must use the Intel MPI library.

On High Performance Computing systems, a module toolkit is used to allow users to choose between multiple variations of libraries that do the same thing.  For example, OpenMPI, MPICH2, MVAPICH2, and IMPI are all different implementations of the Message Passing Interface.  They all do almost the same thing, but they may offer slightly different features.  Stampede uses the MVAPICH2 library by default (an MPI variant created by Ohio State University).  You must swap this out with Intel MPI.  To do this type the following:

module swap mvapich2 impi

To check that this change happened type:

module list

2. What happened?

This shows the modules (pieces of software) that have been loaded.  You should see Intel MPI listed in your loaded modules.

Before working with the Xeon Phi, you will need to set up several environment variables.  Run the following commands:

source ./setup_mic.sh

export MIC_PPN=60

These commands set several environment variables.  Look at the setup_mic.sh file and then run the command:

env

3. What happened?  Were the environment variables set properly?

Run the code 3 times by using ibrun.symm from the node under idev as follows:

ibrun.symm -c hostMsg.exe

ibrun.symm -m micMsg.exe

ibrun.symm -c hostMsg.exe -m micMsg.exe


It is possible to use a variant of the mpiexec command as well.  Recall that it is possible to use mpiexec or mpirun on LittleFe, too.

mpiexec.hydra -env LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -env I_MPI_PIN_MODE mpd -env KMP_AFFINITY balanced -n 60 -host mic0 ./micMsg.exe

mpiexec.hydra -n 16 -host localhost ./msgHost.exe

mpiexec.hydra -n 16 -host localhost ./msgHost.exe : -env LD_LIBRARY_PATH $MIC_LD_LIBRARY_PATH -env I_MPI_PIN_MODE mpd -env KMP_AFFINITY balanced -n 60 -host mic0 ./msgMIC.exe

Now exit idev by typing

exit

and pressing enter.

4. What happened after running each of the six commands above?

Included at the following link is a zip file containing a variant of the dining philosophers code.

5. Explain what this code does.
    a) What does the philosopher function do?
    b) How is deadlock prevented?
    c) What does the server function do?
    d) What are the parts of the MPI_Status struct?
    e) What is the purpose of each of these parts?
    f) Why are these parts needed?  Explain with respect to the MPI_Waitsome function.

Notice that you must provide two command line arguments to run the philosopher code.  The number of philosophers (this should be one less than the number of processes) and the number of meals for each philosopher to eat.

Run the dining philosophers code using mpiexec.hydra under idev in the same three ways as shown in the code above.  Don't forget to include the command line arguments.  First, run the host version on the Sandy Bridge cores of the nodes, second on the Xeon Phi card (called a MIC and pronounced "mike"), and third on both the MIC and Sandy Bridge cores at the same time.

Note that running on the main CPU sockets and the MIC at the same time is called symmetric computing.

6. Include your results in your submission.  Which runs take the longest?

7. Review the code for the dining philosophers.  Why do you think the program that took the longest required so much time?

Consider the following in your answer:

The Sandy Bridge Processors on each Stampede node have the following specifications:

Memory: 32GB
Clock Speed: 2.7GHz
Memory Bandwidth: 51.2 GB/s  * 2
Vector length: 4 Double Precision words
Core count: 8 per socket * 2 sockets = 16

The Xeon Phi cards have the following specifications:

Memory: 8GB
Core count: 61
Hardware threads: 244
Clock speed: 1.1GHz
Memory bandwidth: 352 GB/s (on card)
Vector length: 8 DP words

Read about lustre at http://en.wikipedia.org/wiki/Lustre_(file_system)

8. What is a stripe? an OST? an OSS?

Finally, you will make use of the lfs command on Stampede.

9. First, run man lfs.  What happens?  What is the purpose of lfs?

10. Next, run lfs check mds.  What happens?

11. Then, run lfs getstripe ~.  What happens?  What is the stripe size of your files?  Do any of your files use more than one stripe?