TY - GEN
T1 - TurboTag
T2 - 16th ACM/IEEE International Symposium on Low-Power Electronics and Design, ISLPED'10
AU - Lotfi-Kamran, Pejman
AU - Ferdman, Michael
AU - Crisan, Daniel
AU - Falsafi, Babak
PY - 2010
Y1 - 2010
N2 - On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads have large working sets that are beyond the reach of on-chip caches of modern processors. Limited to capturing a small part of the working set, private caches retain cache blocks only for a short period of time before replacing them with new blocks. Moreover, coherence enforcement is a known performance bottleneck of multi-threaded software, hence data-sharing in optimized high-performance software is minimal. Consequently, the majority of the accesses to the coherence directory find no sharers in the directory because the data are not available in the on-chip private caches, effectively wasting power on the coherence checks. To improve energy-efficiency for future many-core systems, we propose TurboTag, a filtering mechanism to eliminate needless directory lookups. We analyze full-system traces of server and scientific workloads and find that over 69% of accesses to the directory find no sharers and can be entirely avoided. Taking advantage of this behavior, TurboTag achieves a 58% reduction in the directory's dynamic power consumption.
AB - On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads have large working sets that are beyond the reach of on-chip caches of modern processors. Limited to capturing a small part of the working set, private caches retain cache blocks only for a short period of time before replacing them with new blocks. Moreover, coherence enforcement is a known performance bottleneck of multi-threaded software, hence data-sharing in optimized high-performance software is minimal. Consequently, the majority of the accesses to the coherence directory find no sharers in the directory because the data are not available in the on-chip private caches, effectively wasting power on the coherence checks. To improve energy-efficiency for future many-core systems, we propose TurboTag, a filtering mechanism to eliminate needless directory lookups. We analyze full-system traces of server and scientific workloads and find that over 69% of accesses to the directory find no sharers and can be entirely avoided. Taking advantage of this behavior, TurboTag achieves a 58% reduction in the directory's dynamic power consumption.
KW - Bloom
KW - Coherence
KW - Directory
KW - Filter
KW - Low power
UR - https://www.scopus.com/pages/publications/77957957651
U2 - 10.1145/1840845.1840929
DO - 10.1145/1840845.1840929
M3 - Conference contribution
SN - 9781450301466
T3 - Proceedings of the International Symposium on Low Power Electronics and Design
SP - 377
EP - 382
BT - ISLPED'10 - Proceedings of the 16th ACM/IEEE International Symposium on Low-Power Electronics and Design
Y2 - 18 August 2010 through 20 August 2010
ER -