DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Exploiting locality in graph analytics through hardware-accelerated traversal scheduling
Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches
are of little help because the irregular structure of graphs causes seemingly random memory …
are of little help because the irregular structure of graphs causes seemingly random memory …
Augury: Using data memory-dependent prefetchers to leak data at rest
JRS Vicarte, M Flanders, R Paccagnella… - … IEEE Symposium on …, 2022 - ieeexplore.ieee.org
Microarchitectural side-channel attacks are enjoying a time of explosive growth, mostly
fueled by novel transient execution vulnerabilities. These attacks are capable of leaking …
fueled by novel transient execution vulnerabilities. These attacks are capable of leaking …
Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
Opening pandora's box: A systematic study of new ways microarchitecture can leak private data
JRS Vicarte, P Shome, N Nayak… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Microarchitectural attacks have plunged Computer Architecture into a security crisis. Yet, as
the slowing of Moore's law justifies the use of ever more exotic microarchitecture, it is likely …
the slowing of Moore's law justifies the use of ever more exotic microarchitecture, it is likely …
Analysis and optimization of the memory hierarchy for graph processing workloads
Graph processing is an important analysis technique for a wide range of big data
applications. The ability to explicitly represent relationships between entities gives graph …
applications. The ability to explicitly represent relationships between entities gives graph …
PHI: Architectural support for synchronization-and bandwidth-efficient commutative scatter updates
Many applications perform frequent scatter update operations to large data structures. For
example, in push-style graph algorithms, processing each vertex requires updating the data …
example, in push-style graph algorithms, processing each vertex requires updating the data …
Decoupled vector runahead
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …
executing separately to the main application thread, that exploits massive amounts of …
Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …
performance targets at manageable power. However, memory latency bottlenecks remain …
APT-GET: profile-guided timely software prefetching
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …
is a widely-adopted technique to overcome the processor-memory performance gap …