TY - GEN
T1 - Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators
AU - Emani, Murali
AU - Foreman, Sam
AU - Sastry, Varuni
AU - Xie, Zhen
AU - Raskar, Siddhisanket
AU - Arnold, William
AU - Thakur, Rajeev
AU - Vishwanath, Venkatram
AU - Papka, Michael E.
AU - Shanmugavelu, Sanjif
AU - Gandhi, Darshan
AU - Zhao, Hengyu
AU - Ma, Dun
AU - Ranganath, Kiran
AU - Weisner, Rick
AU - Chen, Jiunn Yeu
AU - Yang, Yuting
AU - Vassilieva, Natalia
AU - Zhang, Bin C.
AU - Howland, Sylvia
AU - Tsyplikhin, Alexander
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPU s and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT-2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, and sensitivity to gradient accumulation steps.
AB - Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPU s and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT-2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, and sensitivity to gradient accumulation steps.
UR - https://www.scopus.com/pages/publications/85200760947
U2 - 10.1109/IPDPSW63119.2024.00016
DO - 10.1109/IPDPSW63119.2024.00016
M3 - Conference contribution
T3 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
SP - 48
EP - 57
BT - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
Y2 - 27 May 2024 through 31 May 2024
ER -