TY - GEN
T1 - Energy-Efficient Dataflow Design for Monolithic 3D Systolic Arrays with Resistive RAM
AU - Shukla, Prachi
AU - Hajikhodaverdian, Mohammadamin
AU - Pavlidis, Vasilis F.
AU - Salman, Emre
AU - Coskun, Ayse K.
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Systolic arrays are commonly used for running deep neural networks (DNNs) at the edge, where latency and energy efficiency requirements are stringent. Monolithic 3D (Mono3D) is an emerging 3D integration technology that offers ultra-high vertical interconnect density among processing and memory layers. The bandwidth benefits provided by Mono3D can help meet the growing latency and energy efficiency demands for DNNs. This paper presents a novel implementation for weight stationary (WS) dataflow in Mono3D systolic arrays, called WS-Mono3D. WS-Mono3D utilizes multiple resistive RAM layers and SRAM with high-density vertical interconnects to multicast inputs and performs high-bandwidth weight pre-loading while maintaining the same order of multiply-and-accumulate operations as in native WS dataflow. Consequently, WS-Mono3D eliminates input and weight forwarding cycles, and, thus, provides up to a 40% reduction in energy-delay-product (EDP) over the native WS implementation in 2D with iso-configuration. The paper also demonstrates the impact of temperature on energy efficiency benefits in WS-Mono3D.
AB - Systolic arrays are commonly used for running deep neural networks (DNNs) at the edge, where latency and energy efficiency requirements are stringent. Monolithic 3D (Mono3D) is an emerging 3D integration technology that offers ultra-high vertical interconnect density among processing and memory layers. The bandwidth benefits provided by Mono3D can help meet the growing latency and energy efficiency demands for DNNs. This paper presents a novel implementation for weight stationary (WS) dataflow in Mono3D systolic arrays, called WS-Mono3D. WS-Mono3D utilizes multiple resistive RAM layers and SRAM with high-density vertical interconnects to multicast inputs and performs high-bandwidth weight pre-loading while maintaining the same order of multiply-and-accumulate operations as in native WS dataflow. Consequently, WS-Mono3D eliminates input and weight forwarding cycles, and, thus, provides up to a 40% reduction in energy-delay-product (EDP) over the native WS implementation in 2D with iso-configuration. The paper also demonstrates the impact of temperature on energy efficiency benefits in WS-Mono3D.
KW - Monolithic 3D
KW - dataflow
KW - deep neural networks
KW - energy efficiency
KW - systolic arrays
KW - temperature
UR - https://www.scopus.com/pages/publications/85213392354
U2 - 10.1109/IGSC64514.2024.00022
DO - 10.1109/IGSC64514.2024.00022
M3 - Conference contribution
T3 - Proceedings - 15th International Green and Sustainable Computing Conference, IGSC 2024
SP - 67
EP - 73
BT - Proceedings - 15th International Green and Sustainable Computing Conference, IGSC 2024
A2 - Zhou, Peipei
A2 - Chen, Fan
A2 - Yang, Xiaoxuan
A2 - Hester, Josiah
A2 - Qiu, Qinru
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Green and Sustainable Computing Conference, IGSC 2024
Y2 - 2 November 2024 through 3 November 2024
ER -