Report ID
1996-01
Report Authors
Cong Fu and Tao Yang
Report Date
Abstract
Run-time compilation techniques have been shown effective for automating theparallelization of loops with unstructured indirect data accessing patterns.However, it is still an open problem to efficiently parallelize sparse matrixfactorizations commonly used in iterative numerical problems. The difficultyis that a factorization process contains irregularly-interleaved communicationand computation with varying granularities and it is hard to obtain scalableperformance on distributed memory machines.In this paper, we present an inspector/executor approach for parallelizing suchapplications by embodying automatic graph scheduling techniques to optimizeinterleaved communication and computation. We describe a run-time systemcalled RAPID that provides a set of library functions for specifying irregulardata objects and tasks that access these objects. The system extracts a taskdependence graph from data access patterns, and executes tasks efficiently on adistributed memory machine. We discuss a set of optimization strategies usedin this system and demonstrate the application of this system in parallelizingsparse Cholesky and LU factorizations.
Document
1996-01.ps420.71 KB