Optimizing replication, communication, and capacity allocation in CMPs

S Zhuravlev, JC Saez, S Blagodurov… - ACM Computing …, 2012 - dl.acm.org

Chip multicore processors (CMPs) have emerged as the dominant architecture choice for
modern computing platforms and will most likely continue to be dominant well into the …

被引用次数：234 相关文章所有 9 个版本

[PDF] google.com

Simba: Scaling deep-learning inference with multi-chip-module-based architecture

YS Shao, J Clemons, R Venkatesan, B Zimmer… - Proceedings of the …, 2019 - dl.acm.org

Package-level integration using multi-chip-modules (MCMs) is a promising approach for
building large-scale systems. Compared to a large monolithic die, an MCM combines many …

被引用次数：458 相关文章所有 3 个版本

[PDF] psu.edu

Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches

MK Qureshi, YN Patt - 2006 39th Annual IEEE/ACM …, 2006 - ieeexplore.ieee.org

This paper investigates the problem of partitioning a shared cache between multiple
concurrently executing applications. The commonly used LRU policy implicitly partitions a …

被引用次数：1483 相关文章所有 15 个版本

[PDF] duke.edu

Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

N Muralimanohar, R Balasubramonian… - 40th Annual IEEE …, 2007 - ieeexplore.ieee.org

A significant part of future microprocessor real estate will be dedicated to 12 or 13 caches.
These on-chip caches will heavily impact processor performance, power dissipation, and …

被引用次数：836 相关文章所有 18 个版本

[PDF] ucsb.edu

A novel architecture of the 3D stacked MRAM L2 cache for CMPs

G Sun, X Dong, Y Xie, J Li… - 2009 IEEE 15th …, 2009 - ieeexplore.ieee.org

Magnetic random access memory (MRAM) is a promising memory technology, which has
fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it …

被引用次数：541 相关文章所有 8 个版本

[PDF] epfl.ch

Reactive NUCA: near-optimal block placement and replication in distributed caches

N Hardavellas, M Ferdman, B Falsafi… - Proceedings of the 36th …, 2009 - dl.acm.org

Increases in on-chip communication delay and the large working sets of server and scientific
workloads complicate the design of the on-chip last-level cache for multicore processors …

被引用次数：562 相关文章所有 30 个版本

[PDF] ucsb.edu

Hybrid cache architecture with disparate memory technologies

X Wu, J Li, L Zhang, E Speight, R Rajamony… - ACM SIGARCH computer …, 2009 - dl.acm.org

Caching techniques have been an efficient mechanism for mitigating the effects of the
processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies …

被引用次数：487 相关文章所有 8 个版本

[PDF] researchgate.net

Affinity-based thread and data mapping in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org

Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

被引用次数：54 相关文章所有 6 个版本

[PDF] psu.edu

Design and management of 3D chip multiprocessors using network-in-memory

F Li, C Nicopoulos, T Richardson, Y Xie… - ACM SIGARCH …, 2006 - dl.acm.org

Long interconnects are becoming an increasingly important problem from both power and
performance perspectives. This motivates designers to adopt on-chip network-based …

被引用次数：552 相关文章所有 16 个版本

[PDF] epfl.ch

Scale-out processors

P Lotfi-Kamran, B Grot, M Ferdman, S Volos… - ACM SIGARCH …, 2012 - dl.acm.org

Scale-out datacenters mandate high per-server throughput to get the maximum benefit from
the large TCO investment. Emerging applications (eg, data serving and web search) that run …

被引用次数：279 相关文章所有 25 个版本