TY - GEN
T1 - Efficient irregular wavefront propagation algorithms on Intel(R) Xeon Phi(TM)
AU - Gomes, Jeremias M.
AU - Teodoro, George
AU - Melo, Alba De
AU - Kong, Jun
AU - Kurc, Tahsin
AU - Saltz, Joel H.
N1 - Publisher Copyright: © 2015 IEEE.
PY - 2016/1/12
Y1 - 2016/1/12
N2 - We investigate the execution of the Irregular Wave front Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon PhiTM co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wave front in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations.
AB - We investigate the execution of the Irregular Wave front Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon PhiTM co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wave front in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations.
UR - https://www.scopus.com/pages/publications/84964895183
U2 - 10.1109/SBAC-PAD.2015.13
DO - 10.1109/SBAC-PAD.2015.13
M3 - Conference contribution
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 25
EP - 32
BT - Proceedings - IEEE 27th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2015
PB - IEEE Computer Society
T2 - 27th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2015
Y2 - 18 October 2015 through 21 October 2015
ER -