Homework 2
cs599 Parallel Programming and Algorithms
5 points
Due: Monday, Feb. 9, 2015 at 11:59:59pm

1. Assume that you are given a fairly large array of 3D points, that you must sum together into one 3D point. Write a parallel program to perform this summation two different ways, once using reduction operators and once without them. Notice that this operation (3D point summation) is a reduction, but there is not a direct reduction operation that you may use to sum a list/array of 3D points.

2. Write a program that performs both horizontal and vertical edge detection on a matrix, parallelizing the application of the edge detector by rows (or groups of rows). You may fill the matrix with data of your choice. Generally, images are represented as matrices or 2D arrays. Write a program that performs edge detection on a matrix. There are Sobel two edge detection operators. These follow below:

+---+---+---+
| 1 | 0 |-1 |
+---+---+---+
| 2 | 0 |-2 |
+---+---+---+
| 1 | 0 |-1 |
+---+---+---+

and

+---+---+---+
| 1 | 2 | 1 |
+---+---+---+
| 0 | 0 | 0 |
+---+---+---+
| -1| -2| -1|
+---+---+---+

To find directional edges in an image, these operators are applied to generate two images, displaying horizontal and vertical edges, respectively. The top operator generates an image that displays horizontal edges and the bottom operator produces an image displaying vertical edges. See http://en.wikipedia.org/wiki/Sobel_operator for an example.

Each pixel for each of the vertical and horizontal edge images is computed as the sum of the point to point multiplication of the operator with each pixel in the original image and that pixel’s eight nearest neighbors.

The top operator generates an image that displays horizontal edges and the bottom operator produces an image displaying vertical edges. See http://en.wikipedia.org/wiki/Sobel_operator and http://pages.geo.wvu.edu/~elmes/GEOG250/idrtutor/s_tools3.htm for examples.

3. Assume you have created a new civilization on a two-dimensional grid. At the outset each grid location is occupied by one of three life forms: Rock, Paper, Scissors, Lizard, or Spock. Each day, differing life forms occupying horizontally or vertically adjacent grid locations wage war. In each war, Paper covers Rock, Rock crushes Lizard, Lizard poisons Spock, Spock smashes Scissors, Scissors cut Paper, Paper disproves Spock, Rock crushes Scissors, Lizard eats Paper, Spock vaporizes Rock, and Scissors decapitates Lizard. At the end of the day, the victor expands its territory to include the loser's grid position. The loser vacates the position.

Your job is to write a parallel program that will determine the territory occupied by each life form after n days. The first line of input contains t, the number of test cases. Each test case begins with three integers not greater than 100: r and c, the number of rows and columns in the grid, and n, the number of days. The r lines that follow, each with c characters, represent the grid. Each character in the grid is R, P, S, L, K indicating that Rocks, Paper, Scissors, Lizard, or Spock (represented by K) occupy the position respectively.

For each test case, print the grid as it appears at the end of the nth day. Leave an empty line between the outputs for successive test cases.

Sample Input
2
3 3 1
RRR
RSR
RRR

3 4 2
RSPR
SPRS
PRSP

Output for Sample Input
RRR
RRR
RRR

RRRS
RRSP
RSPR

This problem is based on a programming contest problem available at http://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=1384 and you can find an additional description at http://www.algorithmist.com/index.php/UVa_10443

4. Write a program that uses two threads (in addition to the main thread) to compute the sum of all elements of a double array. Do NOT use an OpenMP parallel for statement in this program. The main thread should acquire the data as input and start two worker threads. The worker threads should sum the first and last halves of the array elements. After the worker threads complete their tasks, the main thread should sum the two results and provide the sum as output.

5. Previously, we have studied how to perform matrix multiplication in class. A more efficient version of matrix multiplication performs an operation called tiling. Tiling allows you to break a matrix into small square size chunks. For this problem, you are to perform matrix multiplication using tiling assuming a tile size of 32 by 32. While it is recommended that you write your own code for this problem, you can find a sample algorithm within the slides (slide numbers 77 and 79) at the following link:

http://www.oscer.ou.edu/Workshops/StorageHierarchy/sipe_storage_20150127.pdf

An example of tiled matrix multiplication is provided below given matrices A, B, and C, which are subdivided as follows:

+---+---+   +---+---+   +---+---+
|A11|A12|   |B11|B12|   |C11|C12|
+---+---+ * +---+---+ = +---+---+
|A21|A22|   |B21|B22|   |C21|C22|
+---+---+   +---+---+   +---+---+

C11 = A11*B11 + A12*B21
C12 = A11*B12 + A12*B22
C21 = A21*B11 + A22*B21
C22 = A21*B12 + A22*B22

After, writing a program to perform tiled matrix multiplication, parallelize your results using at least one OpenMP parallel for directive, and time your code using at minimum 2 and 4 threads.

How do your results compare to those that you achieved in past worksheets?