TY - GEN
T1 - The cache-oblivious gaussian elimination paradigm
T2 - SPAA'07: 19th Annual Symposium on Parallelism in Algorithms and Architectures
AU - Chowdhury, Rezaul Alam
AU - Ramachandran, Vijaya
PY - 2007
Y1 - 2007
N2 - The Gaussian Elimination Paradigm (GEP) was introduced by the authors in [6] to represent the triply-nested loop computation that occurs in several important algorithms including Gaussian elimination without pivoting and Floyd-Warshall's all-pairs shortest paths algorithm. An efficient cache-oblivious algorithm for these instances of GEP was presented in [6]. In this paper we establish several important properties of this cache-oblivious framework, and extend the framework to solve GEP in its full generality within the same time and I/O bounds. We then analyze a parallel implementation of the framework and its caching performance for both shared and distributed caches. We present extensive experimental results for both in-core and out-of-core performance of our algorithms. We consider both sequential and parallel implementations of our algorithms, and compare them with finely-tuned cache-aware BLAS code for matrix multiplication and Gaussian elimination without pivoting. Our results indicate that cache-oblivious GEP offers an attractive tradeoff between efficiency and portability.
AB - The Gaussian Elimination Paradigm (GEP) was introduced by the authors in [6] to represent the triply-nested loop computation that occurs in several important algorithms including Gaussian elimination without pivoting and Floyd-Warshall's all-pairs shortest paths algorithm. An efficient cache-oblivious algorithm for these instances of GEP was presented in [6]. In this paper we establish several important properties of this cache-oblivious framework, and extend the framework to solve GEP in its full generality within the same time and I/O bounds. We then analyze a parallel implementation of the framework and its caching performance for both shared and distributed caches. We present extensive experimental results for both in-core and out-of-core performance of our algorithms. We consider both sequential and parallel implementations of our algorithms, and compare them with finely-tuned cache-aware BLAS code for matrix multiplication and Gaussian elimination without pivoting. Our results indicate that cache-oblivious GEP offers an attractive tradeoff between efficiency and portability.
KW - All-pairs shortest path
KW - Cache-oblivious algorithm
KW - Gaussian elimination
KW - Matrix multiplication
KW - Tiling
UR - https://www.scopus.com/pages/publications/35248831668
U2 - 10.1145/1248377.1248392
DO - 10.1145/1248377.1248392
M3 - Conference contribution
SN - 159593667X
SN - 9781595936677
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 71
EP - 80
BT - SPAA'07
Y2 - 9 June 2007 through 11 June 2007
ER -