Worksheet 3
CS599, Parallel Programming and Algorithms
Introduction to OpenMP Directive Programming

This worksheet is based upon examples from the LLNL OpenMP site and examples from the 44-550, Operating Systems course. The purpose of this lesson is to continue learning about command line compilers, multi-threaded programming, OpenMP directives, and the Linux Operating System.

You should use the LittleFe system to complete this worksheet. Follow the instructions on the class website to login to either littlefe.nwmissouri.edu or littlefe2.nwmissouri.edu using the secure shell protocol (ssh). 

This worksheet includes both reading material and exercises. Statements such as "Enter the following command..." or "Type..." denote that you should type an appropriate response at the command line to get a response from the operating system. UNIX commands will be in bold font. Write answers in the spaces provided.

1. Copy and paste the following code into a file called matMult.c and complete it. Instructions will be provided on how to perform matrix multiplication.

Once complete, compile this code using gcc matMult.c -O3 -o matMult.exe

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define n 2000

int main(int argc, char ** argv)
{
  int i, j, k, x;

  int ** a;
  int ** b;
  int ** c;

  a = (int **) malloc(sizeof(int *)* n);
  b = (int **) malloc(sizeof(int *)* n);
  c = (int **) malloc(sizeof(int *)* n);

  for(i=0; i<n; i++)
  {
    a[i] = malloc(sizeof(int)*n);
    b[i] = malloc(sizeof(int)*n);
    c[i] = malloc(sizeof(int)*n);
  }

  for(i =0; i < n; i++)
    for(j=0; j < n; j++)
    {
      c[i][j] = 0;
      a[i][j] = b[i][j] = i;
    }

  time_t start1, end1;
  clock_t start2, end2;

  start1 = time(NULL);
  start2 = clock();

/**************************
* Perform matrix *
* multiplication here *
**************************/

  end1 = time(NULL);
  end2 = clock();

  printf("Wall time taken was %lf\nCPU time taken was %lf\n",
  difftime(end1, start1), ((double)(end2-start2))/CLOCKS_PER_SEC);

  for(i=0; i<n; i++)
  {
    free((void *) a[i]);
    free((void *) b[i]);
    free((void *) c[i]);
    }

  free((void *) a);
  free((void *) b);
  free((void *) c);

  return 0;
}

2. After completing the code above, copy it into a file called matMult2.c. Use the cp matMult.c matMult2.c command to make a copy of your code. Modify this code so that matrix operations are performed in row major order instead of column major.

Once complete, compile this code using gcc matMult2.c -O3 -o matMult2.exe

3. Make copies of matMult.c and matMult2.c into files called matMultOMP.c and matMult2OMP.c. Modify these new files to make use of OpenMP parallel for directives. Ensure that your code makes use of the appropriate shared, private, and num_threads directives. Compile each of these files using the -fopenmp option in addition to the options used in previous examples.

4. Run each of the serial versions and write down the run times of each. Next, compile and run each of the parallel versions using 2 threads. Write down the run times of each. Next, compile and run each of the parallel versions using 4 threads. Write down the run times of each.

5. How do the run times differ? Why are they different? How much speedup is achieved using two threads? four threads? How much speedup does the row major version offer over the column major version?

6. Given the following code, convert the for loop to use an OpenMP Reduction. Compile and run this code. If you have time, try timing the code with an increased vector size. What happens as you increase the vector size and the number of threads?

#include <stdio.h>
#include <stdlib.h>

#define N 1000

int main(int argc, char ** argv)
{
  int A[N], B[N];
  int i;
  int dp;

  for(i = 0; i < N; i++)
  {
    A[i] = rand() % 100;
    B[i] = rand() % 100;
  }

  dp = 0;
  for(i = 0; i < N; i++)
    dp += A[i]*B[i];

  printf("The dot product is %d\n", dp);
  return 0;
}