Below is the sequential pseudo-code for multiplication and addition of two matrices where the result is stored in the matrix . The pseudo-code for multiplication calculates the
dot product of two matrices , and stores the result into the output matrix . If the following programs were executed sequentially, the time taken to calculate the result would be of the O(n^3)(assuming row lengths and column lengths of both matrices are n) and O(n)for multiplication and addition respectively. // Matrix multiplication for (int i = 0; i // Array addition for (int i = 0; i We can exploit data parallelism in the preceding code to execute it faster as the arithmetic is loop independent. Parallelization of the matrix multiplication code is achieved by using
OpenMP. An OpenMP directive, "omp parallel for" instructs the compiler to execute the code in the for loop in parallel. For multiplication, we can divide matrix A and B into blocks along rows and columns respectively. This allows us to calculate every element in matrix C individually thereby making the task parallel. For example:
A[m x n] dot B [n x k] can be finished in O(n) instead of O(m*n*k) when executed in parallel using
m*k processors. // Matrix multiplication in parallel • pragma omp parallel for schedule(dynamic,1) collapse(2) for (int i = 0; i It can be observed from the example that a lot of processors will be required as the matrix sizes keep on increasing. Keeping the execution time low is the priority but as the matrix size increases, we are faced with other constraints like complexity of such a system and its associated costs. Therefore, constraining the number of processors in the system, we can still apply the same principle and divide the data into bigger chunks to calculate the product of two matrices. For addition of arrays in a data parallel implementation, let's assume a more modest system with two
central processing units (CPU) A and B, CPU A could add all elements from the top half of the arrays, while CPU B could add all elements from the bottom half of the arrays. Since the two processors work in parallel, the job of performing array addition would take one half the time of performing the same operation in serial using one CPU alone. The program expressed in
pseudocode below—which applies some arbitrary operation, foo, on every element in the array d—illustrates data parallelism:
if CPU = "a"
then lower_limit := 1 upper_limit := round(d.length / 2)
else if CPU = "b"
then lower_limit := round(d.length / 2) + 1 upper_limit := d.length
for i from lower_limit to upper_limit by 1
do foo(d[i]) In an
SPMD system executed on 2 processor system, both CPUs will execute the code. Data parallelism emphasizes the distributed (parallel) nature of the data, as opposed to the processing (task parallelism). Most real programs fall somewhere on a continuum between task parallelism and data parallelism. == Steps to parallelization ==