Survey of scheduling techniques for addressing shared resources in multicore processors
S Zhuravlev, JC Saez, S Blagodurov… - ACM Computing …, 2012 - dl.acm.org
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for
modern computing platforms and will most likely continue to be dominant well into the …
modern computing platforms and will most likely continue to be dominant well into the …
Simba: Scaling deep-learning inference with multi-chip-module-based architecture
Package-level integration using multi-chip-modules (MCMs) is a promising approach for
building large-scale systems. Compared to a large monolithic die, an MCM combines many …
building large-scale systems. Compared to a large monolithic die, an MCM combines many …
Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
MK Qureshi, YN Patt - 2006 39th Annual IEEE/ACM …, 2006 - ieeexplore.ieee.org
This paper investigates the problem of partitioning a shared cache between multiple
concurrently executing applications. The commonly used LRU policy implicitly partitions a …
concurrently executing applications. The commonly used LRU policy implicitly partitions a …
Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0
N Muralimanohar, R Balasubramonian… - 40th Annual IEEE …, 2007 - ieeexplore.ieee.org
A significant part of future microprocessor real estate will be dedicated to 12 or 13 caches.
These on-chip caches will heavily impact processor performance, power dissipation, and …
These on-chip caches will heavily impact processor performance, power dissipation, and …
A novel architecture of the 3D stacked MRAM L2 cache for CMPs
Magnetic random access memory (MRAM) is a promising memory technology, which has
fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it …
fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it …
Reactive NUCA: near-optimal block placement and replication in distributed caches
Increases in on-chip communication delay and the large working sets of server and scientific
workloads complicate the design of the on-chip last-level cache for multicore processors …
workloads complicate the design of the on-chip last-level cache for multicore processors …
Hybrid cache architecture with disparate memory technologies
Caching techniques have been an efficient mechanism for mitigating the effects of the
processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies …
processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies …
Affinity-based thread and data mapping in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
Design and management of 3D chip multiprocessors using network-in-memory
F Li, C Nicopoulos, T Richardson, Y Xie… - ACM SIGARCH …, 2006 - dl.acm.org
Long interconnects are becoming an increasingly important problem from both power and
performance perspectives. This motivates designers to adopt on-chip network-based …
performance perspectives. This motivates designers to adopt on-chip network-based …
Scale-out processors
Scale-out datacenters mandate high per-server throughput to get the maximum benefit from
the large TCO investment. Emerging applications (eg, data serving and web search) that run …
the large TCO investment. Emerging applications (eg, data serving and web search) that run …