DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Exploiting locality in graph analytics through hardware-accelerated traversal scheduling

A Mukkara, N Beckmann, M Abeydeera… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches
are of little help because the irregular structure of graphs causes seemingly random memory …

Augury: Using data memory-dependent prefetchers to leak data at rest

JRS Vicarte, M Flanders, R Paccagnella… - … IEEE Symposium on …, 2022 - ieeexplore.ieee.org
Microarchitectural side-channel attacks are enjoying a time of explosive growth, mostly
fueled by novel transient execution vulnerabilities. These attacks are capable of leaking …

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

Opening pandora's box: A systematic study of new ways microarchitecture can leak private data

JRS Vicarte, P Shome, N Nayak… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Microarchitectural attacks have plunged Computer Architecture into a security crisis. Yet, as
the slowing of Moore's law justifies the use of ever more exotic microarchitecture, it is likely …

Analysis and optimization of the memory hierarchy for graph processing workloads

A Basak, S Li, X Hu, SM Oh, X Xie… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Graph processing is an important analysis technique for a wide range of big data
applications. The ability to explicitly represent relationships between entities gives graph …

PHI: Architectural support for synchronization-and bandwidth-efficient commutative scatter updates

A Mukkara, N Beckmann, D Sanchez - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
Many applications perform frequent scatter update operations to large data structures. For
example, in push-style graph algorithms, processing each vertex requires updating the data …

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs

M Orenes-Vera, A Manocha, J Balkind, F Gao… - Proceedings of the 49th …, 2022 - dl.acm.org
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …

APT-GET: profile-guided timely software prefetching

S Jamilan, TA Khan, G Ayers, B Kasikci… - Proceedings of the …, 2022 - dl.acm.org
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …