Report ID
1998-08
Report Authors
Kai Shen, Xiangmin Jiao, and Tao Yang
Report Date
Abstract
Sparse LU factorization with partial pivoting is important for many scientificapplications and delivering high performance for this problem is difficult ondistributed memory machines. Our previous work has developed an approachcalled $S^*$ that incorporates static symbolic factorization, supernodepartitioning and graph scheduling. This paper studies the properties ofelimination forests and uses them to guide supernode partitioning/amalgamationand execution scheduling. The new design with 2D mapping effectivelyidentifies dense structures without introducing too many zeros in the BLAScomputation and exploits asynchronous parallelism with low buffer space cost.The implementation of this code, called $S^+$, uses supernodal matrixmultiplication which retains the BLAS-3 level efficiency and avoids unnecessaryarithmetic operations. The experiments show that $S^+$ improves our previouscode substantially and can achieve up to 11.04GFLOPS on 128 Cray T3E 450MHznodes, which is the highest performance reported in the literature.
Document
1998-08.ps357.18 KB