Survey of scheduling techniques for addressing shared resources in multicore processors

S Zhuravlev, JC Saez, S Blagodurov… - ACM Computing …, 2012 - dl.acm.org
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for
modern computing platforms and will most likely continue to be dominant well into the …

Simba: Scaling deep-learning inference with multi-chip-module-based architecture

YS Shao, J Clemons, R Venkatesan, B Zimmer… - Proceedings of the …, 2019 - dl.acm.org
Package-level integration using multi-chip-modules (MCMs) is a promising approach for
building large-scale systems. Compared to a large monolithic die, an MCM combines many …

Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches

MK Qureshi, YN Patt - 2006 39th Annual IEEE/ACM …, 2006 - ieeexplore.ieee.org
This paper investigates the problem of partitioning a shared cache between multiple
concurrently executing applications. The commonly used LRU policy implicitly partitions a …

Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

N Muralimanohar, R Balasubramonian… - 40th Annual IEEE …, 2007 - ieeexplore.ieee.org
A significant part of future microprocessor real estate will be dedicated to 12 or 13 caches.
These on-chip caches will heavily impact processor performance, power dissipation, and …

A novel architecture of the 3D stacked MRAM L2 cache for CMPs

G Sun, X Dong, Y Xie, J Li… - 2009 IEEE 15th …, 2009 - ieeexplore.ieee.org
Magnetic random access memory (MRAM) is a promising memory technology, which has
fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it …

Reactive NUCA: near-optimal block placement and replication in distributed caches

N Hardavellas, M Ferdman, B Falsafi… - Proceedings of the 36th …, 2009 - dl.acm.org
Increases in on-chip communication delay and the large working sets of server and scientific
workloads complicate the design of the on-chip last-level cache for multicore processors …

Hybrid cache architecture with disparate memory technologies

X Wu, J Li, L Zhang, E Speight, R Rajamony… - ACM SIGARCH computer …, 2009 - dl.acm.org
Caching techniques have been an efficient mechanism for mitigating the effects of the
processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies …

Affinity-based thread and data mapping in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Design and management of 3D chip multiprocessors using network-in-memory

F Li, C Nicopoulos, T Richardson, Y Xie… - ACM SIGARCH …, 2006 - dl.acm.org
Long interconnects are becoming an increasingly important problem from both power and
performance perspectives. This motivates designers to adopt on-chip network-based …

Scale-out processors

P Lotfi-Kamran, B Grot, M Ferdman, S Volos… - ACM SIGARCH …, 2012 - dl.acm.org
Scale-out datacenters mandate high per-server throughput to get the maximum benefit from
the large TCO investment. Emerging applications (eg, data serving and web search) that run …