Automatic parallelization (also known as auto parallelization or Autoparallelization), refers to the use of a modern optimizing parallelizing compiler to convert sequential code into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. The goal of automatic parallelization is to relieve programers from the tedious and error-prone manual parallelization process.
The programming control structures on which auto parallelization places the most focus are loops, because, in general, most of the execution time of a program takes place inside some form of loop. An auto parallelization compiler tries to split up a loop so that its iterations can be executed on separate processors concurrently.
Compiler Parallelization Analysis
The compiler usually conducts two passes of analysis before actual parallelization in order to determine the following:
- Is it safe to parallelize the loop?
- Is it worthwile to parallize it?
The first pass of the compiler performs a data dependence analysis of the loop to determine whether each iteration of the loop can be executed independently of the others. Data dependence can sometimes be dealt with, but it may incur additional overhead in the form of message passing, synchronization of shared memory, or some other method of processor communication.
The second pass attempts to justify the parallization effort by comparing the theoretical execution time of the code after parallelization to the code's sequential execution time. Somewhat counterintuitively, code does not always benefit from parallel execution. The extra overhead that can be associated with using multiple processors can eat into the potential speedup of parallized code.
A brief example of Auto-Parallelization
Code 1 below can be auto-parallelized by a compiler because each iteration is independent of the others, and the final result of array z is will be correct regardless of the execution order of the other iterations.
!code 1 do i=1,n z(i) = x(i) + y(i) enddo
On the other hand, code 2 below cannot be auto-parallelized, because the value of z(i) depends on the result of the previous iteration z(i-1).
!code 2 do i=1,n z(i) = z(i -1)*2 enddo
This does not mean that the code cannot be parallelized. However, current parallelizing compilers are not capable of bringing out these parallelisms automatically, and it is very questionable as to whether this code would benefit from parallelization in the first place.