Predicting inter-thread cache contention on a chip multi-processor architecture
This paper studies the impact of L2 cache sharing on threads that simultaneously share the
cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads …
cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads …
The ZCache: Decoupling ways and associativity
D Sanchez, C Kozyrakis - 2010 43rd Annual IEEE/ACM …, 2010 - ieeexplore.ieee.org
The ever-increasing importance of main memory latency and bandwidth is pushing CMPs
towards caches with higher capacity and associativity. Associativity is typically improved by …
towards caches with higher capacity and associativity. Associativity is typically improved by …
The V-Way cache: demand-based associativity via global replacement
MK Qureshi, D Thompson… - … Symposium on Computer …, 2005 - ieeexplore.ieee.org
As processor speeds increase and memory latency becomes more critical, intelligent design
and management of secondary caches becomes increasingly important. The efficiency of …
and management of secondary caches becomes increasingly important. The efficiency of …
Talus: A simple way to remove cliffs in cache performance
N Beckmann, D Sanchez - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
Caches often suffer from performance cliffs: minor changes in program behavior or available
cache space cause large changes in miss rate. Cliffs hurt performance and complicate …
cache space cause large changes in miss rate. Cliffs hurt performance and complicate …
The bunker cache for spatio-value approximation
The cost of moving and storing data is still a fundamental concern for computer architects.
Inefficient handling of data can be attributed to conventional architectures being oblivious to …
Inefficient handling of data can be attributed to conventional architectures being oblivious to …
Futility scaling: High-associativity cache partitioning
As shared last level caches are widely used in many-core CMPs to boost system
performance, partitioning a large shared cache among multiple concurrently running …
performance, partitioning a large shared cache among multiple concurrently running …
Modeling cache performance beyond LRU
N Beckmann, D Sanchez - 2016 IEEE International Symposium …, 2016 - ieeexplore.ieee.org
Modern processors use high-performance cache replacement policies that outperform
traditional alternatives like least-recently used (LRU). Unfortunately, current cache models …
traditional alternatives like least-recently used (LRU). Unfortunately, current cache models …
TLB tag parity checking without CAM read
MA Luttrell, PJ Jordan - US Patent 7,366,829, 2008 - Google Patents
5,596,293 A 1/1997 Rogers et al. access operations is described in connection with a multi
5,712,791 A 1/1998 Lauterbach................. 364,489 threaded multiprocessor chip. This parity …
5,712,791 A 1/1998 Lauterbach................. 364,489 threaded multiprocessor chip. This parity …
Adaptive line placement with the set balancing cache
Efficient memory hierarchy design is critical due to the increasing gap between the speed of
the processors and the memory. One of the sources of inefficiency in current caches is the …
the processors and the memory. One of the sources of inefficiency in current caches is the …
XOR-based hash functions
H Vandierendonck… - IEEE Transactions on …, 2005 - ieeexplore.ieee.org
Bank conflicts can severely reduce the bandwidth of an interleaved multibank memory and
conflict misses increase the miss rate of a cache or a predictor. Both occurrences are …
conflict misses increase the miss rate of a cache or a predictor. Both occurrences are …