TY - GEN
T1 - Practical off-chip meta-data for temporal memory streaming
AU - Wenisch, Thomas F.
AU - Ferdman, Michael
AU - Anastasia, Ailamaki
AU - Falsafi, Babak
AU - Moshovos, Andreas
PY - 2009
Y1 - 2009
N2 - Prior research demonstrates that temporal memory streaming and related address-correlating prefetchers improve performance of commercial server workloads though increased memory level parallelism. Unfortunately, these prefetchers require large on-chip meta-data storage, making previously-proposed designs impractical. Hence, to improve practicality, researchers have sought ways to enable timely prefetch while locating meta-data entirely off-chip. Unfortunately, current solutions for off-chip meta-data increase memory traffic by over a factor of three. We observe three requirements to store meta-data off chip: minimal off-chip lookup latency, bandwidthefficient meta-data updates, and off-chip lookup amortized over many prefetches. In this work, we show: (1) minimal off-chip meta-data lookup latency can be achieved through a hardware-managed main memory hash table, (2) bandwidth-efficient updates can be performed through probabilistic sampling of meta-data updates, and (3) off-chip lookup costs can be amortized by organizing meta-data to allow a single lookup to yield long prefetch sequences. Using these techniques, we develop Sampled Temporal Memory Streaming (STMS), a practical address-correlating prefetcher that keeps predictor meta-data in main memory while achieving 90% of the performance potential of idealized on-chip meta-data storage.
AB - Prior research demonstrates that temporal memory streaming and related address-correlating prefetchers improve performance of commercial server workloads though increased memory level parallelism. Unfortunately, these prefetchers require large on-chip meta-data storage, making previously-proposed designs impractical. Hence, to improve practicality, researchers have sought ways to enable timely prefetch while locating meta-data entirely off-chip. Unfortunately, current solutions for off-chip meta-data increase memory traffic by over a factor of three. We observe three requirements to store meta-data off chip: minimal off-chip lookup latency, bandwidthefficient meta-data updates, and off-chip lookup amortized over many prefetches. In this work, we show: (1) minimal off-chip meta-data lookup latency can be achieved through a hardware-managed main memory hash table, (2) bandwidth-efficient updates can be performed through probabilistic sampling of meta-data updates, and (3) off-chip lookup costs can be amortized by organizing meta-data to allow a single lookup to yield long prefetch sequences. Using these techniques, we develop Sampled Temporal Memory Streaming (STMS), a practical address-correlating prefetcher that keeps predictor meta-data in main memory while achieving 90% of the performance potential of idealized on-chip meta-data storage.
UR - https://www.scopus.com/pages/publications/64949123191
U2 - 10.1109/HPCA.2009.4798239
DO - 10.1109/HPCA.2009.4798239
M3 - Conference contribution
SN - 9781424429325
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 79
EP - 90
BT - Proceedings - 15th International Symposium on High-Performance Computer Architecture, HPCA - 15 2009
PB - IEEE Computer Society
T2 - IEEE 15th International Symposium on High Performance Computer Architecture, HPCA 2009
Y2 - 14 February 2009 through 18 February 2009
ER -