TY - GEN
T1 - High-performance recursive dynamic programming for bioinformatics using MM-like flexible kernels
AU - Tithi, Jesmin Jahan
AU - Ganapathi, Pramod
AU - Talati, Aakrati
AU - Chowdhury, Rezaul
N1 - Publisher Copyright: Copyright © 2014 ACM.
PY - 2014/9/20
Y1 - 2014/9/20
N2 - Dynamic Programming (DP) provides optimal solutions to a problem by combining optimal solutions to many over- lapping subproblems. DP algorithms exploit this overlap- ping property to explore otherwise exponential-sized prob-lem spaces in polynomial time, making them central to many important applications spanning from logistics to computa- Tional biology. In this paper, we present a general strategy of obtaining highly efficient parallel DP implementations us- ing recursive cache-oblivious divide and conquer technique which turns in flexible kernels into flexible ones (kernels that read from and write to disjoint sub-matrices). We solve four non-trivial DP problems widely used in Bioinformatics, namely the parenthesis problem, Floyd-Warshall's all-pairs shortest paths, gap problem and protein accordion folding using recursive cache-oblivious technique that decompose the original in flexible looping kernel to highly optimizable flexible kernels. To the best of our knowledge no such re- cursive parallel DP algorithms were known for the protein folding and gap problems. The algorithms are hybrid in the same way as most high-performance matrix multiplication algorithms are recursive with iterative base cases. We show that the base cases of these recursive divide-and-conquer al- gorithms are predominantly matrix-multiplication-like (MM- like) flexible that expose many optimization opportunities not offered by the traditional looping DP codes. Moreover, the most costly/dominating kernel for these problems are of- Ten flexible. As a result, one can obtain highly efficient DP implementations by simply optimizing these kernels. We present a few generic optimization steps that suffices to optimize these DP implementations. Our implementations achieve 5 - 100 speedup over their standard loop based DP counterparts on modern multicore machines. We also present results on manycores (Xeon Phi) and clusters of mul- Ticores obtained by simple extensions for SIMD and shared- distributed-shared-memory architectures, respectively.
AB - Dynamic Programming (DP) provides optimal solutions to a problem by combining optimal solutions to many over- lapping subproblems. DP algorithms exploit this overlap- ping property to explore otherwise exponential-sized prob-lem spaces in polynomial time, making them central to many important applications spanning from logistics to computa- Tional biology. In this paper, we present a general strategy of obtaining highly efficient parallel DP implementations us- ing recursive cache-oblivious divide and conquer technique which turns in flexible kernels into flexible ones (kernels that read from and write to disjoint sub-matrices). We solve four non-trivial DP problems widely used in Bioinformatics, namely the parenthesis problem, Floyd-Warshall's all-pairs shortest paths, gap problem and protein accordion folding using recursive cache-oblivious technique that decompose the original in flexible looping kernel to highly optimizable flexible kernels. To the best of our knowledge no such re- cursive parallel DP algorithms were known for the protein folding and gap problems. The algorithms are hybrid in the same way as most high-performance matrix multiplication algorithms are recursive with iterative base cases. We show that the base cases of these recursive divide-and-conquer al- gorithms are predominantly matrix-multiplication-like (MM- like) flexible that expose many optimization opportunities not offered by the traditional looping DP codes. Moreover, the most costly/dominating kernel for these problems are of- Ten flexible. As a result, one can obtain highly efficient DP implementations by simply optimizing these kernels. We present a few generic optimization steps that suffices to optimize these DP implementations. Our implementations achieve 5 - 100 speedup over their standard loop based DP counterparts on modern multicore machines. We also present results on manycores (Xeon Phi) and clusters of mul- Ticores obtained by simple extensions for SIMD and shared- distributed-shared-memory architectures, respectively.
UR - https://www.scopus.com/pages/publications/84920749785
U2 - 10.1145/2649387.2660796
DO - 10.1145/2649387.2660796
M3 - Conference contribution
T3 - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 600
EP - 601
BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Y2 - 20 September 2014 through 23 September 2014
ER -