Skip to main navigation Skip to search Skip to main content

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

  • Michael A. Bender
  • , Jonathan Berry
  • , Simon D. Hammond
  • , K. Scott Hemmert
  • , Samuel McCauley
  • , Branden Moore
  • , Benjamin Moseley
  • , Cynthia A. Phillips
  • , David Resnick
  • , Arun Rodrigues

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

A fundamental challenge for supercomputer architecture is that processors cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. As the number of cores per chip increases, and traditional DDR DRAM speeds stagnate, the problem is only getting worse. A variety of non-DDR 3D memory technologies (Wide I/O 2, HBM) offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems likely that future systems will require at least two levels of main memory: high-bandwidth, low-power memory near the processor and low-bandwidth high-capacity memory further away. This near memory will probably not have significantly faster latency than the far memory. This, combined with the large size of the near memory (multiple GB) and power constraints, may make it difficult to treat it as a standard cache. In this paper, we explore some of the design space for a user-controlled multi-level main memory. We present algorithms designed for the heterogeneous bandwidth, using streaming to exploit data locality. We consider algorithms for the fundamental application of sorting. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories' SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance. This co-design effort suggests implementing two-level main memory systems may improve memory performance in fundamental applications.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages835-846
Number of pages12
ISBN (Electronic)0769555101, 9780769555102
DOIs
StatePublished - Sep 29 2015
Event29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015 - Hyderabad, India
Duration: May 25 2015May 29 2015

Publication series

NameProceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
Volume2015-January

Conference

Conference29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015
Country/TerritoryIndia
CityHyderabad
Period05/25/1505/29/15

Fingerprint

Dive into the research topics of 'Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation'. Together they form a unique fingerprint.

Cite this