TY - GEN
T1 - Enabling Low-Overhead Communication in Multi-Threaded OpenSHMEM Applications using Contexts
AU - Lu, Wenbin
AU - Curtis, Tony
AU - Chapman, Barbara
N1 - Publisher Copyright: © 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - As the number of shared-memory cores per node in modern High Performance Computing (HPC) machines continues to grow, hybrid programming models like MPI+threads are be-coming a preferred choice for scientific applications. While being able to utilize computation resources efficiently, threads in hybrid applications often compete with each other for communication resources, resulting in a negative impact on performance. The OpenSHMEM distributed programming model provides com-munication context objects that can be used to provide threads with isolated access to the network, thus reducing contention. In this work, we discuss a design for OpenSHMEM contexts and an implementation of the context construct to support hybrid multi-Threaded applications and evaluate the performance of the implementation. In all our micro-benchmarks, threads show nearly identical communication performance compared to single-Threaded OpenSHMEM processes. By using contexts in hybrid benchmarks, we have achieved up to 43.1% performance improvement for 3D halo exchange, 339% improvement for all-to-All communication, and 35.4% improvement for inter-node load balancing.
AB - As the number of shared-memory cores per node in modern High Performance Computing (HPC) machines continues to grow, hybrid programming models like MPI+threads are be-coming a preferred choice for scientific applications. While being able to utilize computation resources efficiently, threads in hybrid applications often compete with each other for communication resources, resulting in a negative impact on performance. The OpenSHMEM distributed programming model provides com-munication context objects that can be used to provide threads with isolated access to the network, thus reducing contention. In this work, we discuss a design for OpenSHMEM contexts and an implementation of the context construct to support hybrid multi-Threaded applications and evaluate the performance of the implementation. In all our micro-benchmarks, threads show nearly identical communication performance compared to single-Threaded OpenSHMEM processes. By using contexts in hybrid benchmarks, we have achieved up to 43.1% performance improvement for 3D halo exchange, 339% improvement for all-to-All communication, and 35.4% improvement for inter-node load balancing.
KW - Hybrid programming
KW - OpenSHMEM
KW - Performance
KW - PGAS
UR - https://www.scopus.com/pages/publications/85084115195
U2 - 10.1109/PAW-ATM49560.2019.00010
DO - 10.1109/PAW-ATM49560.2019.00010
M3 - Conference contribution
T3 - Proceedings of PAW-ATM 2019: Parallel Applications Workshop, Alternatives to MPI+X, Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 47
EP - 57
BT - Proceedings of PAW-ATM 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE/ACM Parallel Applications Workshop, Alternatives to MPI+X, PAW-ATM 2019
Y2 - 17 November 2019
ER -