Algorithms Design for the Parallelization of Nested Loops
Ciorba, F. M. (2008). Algorithms Design for the Parallelization of Nested Loops. NTUA Library - Doctoral Dissertations Section. Athens, Greece.
The need for parallel processing arises from the existence of time consuming applications in different areas, such as weather forecasting, nuclear fusion simulations, DNA and protein analysis, computational fluid dynamics, etc. Parallel processing comprises algorithms, computer architecture, parallel programming and performance analysis. In optimizing the performance of scientific and engineering sequential programs, the most gain comes from optimizing nested loops or recursive procedures, where major chunks of computation are performed repeatedly. Nested loops without dependencies are called DOALL, while those with dependencies are called DOACROSS loops. Parallelizing DOACROSS loops is much more challenging than parallelizing DOALL loops, because the existing dependencies between iterations of the loop nest much be satisfied. The challenges that must be addressed for the parallelization of time consuming applications are: minimizing the total execution time, minimizing the communication time between the processors (especially in the case of DOACROSS loops), load balancing the computational load among the processors, dealing with and recovering from failures that may occur either in the program or the system, meeting deadlines, or a combination of these. This doctoral dissertation focuses on parallelizing applications that contain nested DOACROSS loops, while trying to address some of the aforementioned challenges. In particular, it proposes and presents four static methods and three dynamic methods for scheduling nested DOACROSS loops on various architectures. The static scheduling methods were devised for homogeneous systems, while the dynamic scheduling methods were devised for heterogeneous systems or systems with rapidly vary- ing loads. One of the dynamic approaches was bibliographically the first attempt towards the parallelization of nested DOACROSS loops using a coarse grain approach and dynamic scheduling, on heterogeneous systems. The proposed algorithms were implemented, verified and evaluated through extensive experiments on various computer systems architectures.