cs599
Homework 5
10 bonus points
Due Apr. 29, 2015 at midnight
Each problem is worth 2 points.
You must fully complete each problem to receive extra credit

1. Ensure at least 10,000,000 elements (preferably 100,000,000) of input data are generated for the FFT programs as provided on the course website. Test the recursive and iterative versions of the FFT, and compute and graph speedup for the program using 4, 8, 16, and 32 CPU cores on Stampede.

2. Parallelize Floyd’s algorithm using MPI. Note that the serial version is provided on the class website.

3. Write a CUDA program that will perform tiled matrix multiplication with CUDA using randomly generated data. Try this on a GPU node on Stampede using two 4096 by 4096 matrices with a tile size of 64 by 64.

4. Write a CUDA program that will perform a matrix transpose using a 1D array with randomly generated data. Try this on Stampede using a 4096 by 4096 matrix.

5. Write a CUDA program that will output a grayscale image given an RGB image as input. Hint: the formula to convert from RGB to grayscale is in the slides.