Pythia: A customizable hardware prefetching framework using online reinforcement learning
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …
on exploiting one specific type of program context information (eg, program counter …
A hierarchical neural model of data prefetching
This paper presents Voyager, a novel neural network for data prefetching. Unlike previous
neural models for prefetching, which are limited to learning delta correlations, our model can …
neural models for prefetching, which are limited to learning delta correlations, our model can …
The championship simulator: Architectural simulation for education and competition
Recent years have seen a dramatic increase in the microarchitectural complexity of
processors. This increase in complexity presents a twofold challenge for the field of …
processors. This increase in complexity presents a twofold challenge for the field of …
Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
Evaluation of hardware data prefetchers on server processors
M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …
fetching those that are not in the on-chip caches, is a well-known and widely used approach …
Decoupled vector runahead
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …
executing separately to the main application thread, that exploits massive amounts of …
A survey on pcm lifetime enhancement schemes
Phase Change Memory (PCM) is an emerging memory technology that has the capability to
address the growing demand for memory capacity and bridge the gap between the main …
address the growing demand for memory capacity and bridge the gap between the main …
Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction
R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …
processors. To increase the latency tolerance of a processor, architects have primarily relied …
APT-GET: profile-guided timely software prefetching
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …
is a widely-adopted technique to overcome the processor-memory performance gap …
Spaghetti: Streaming accelerators for highly sparse gemm on fpgas
Generalized Sparse Matrix-Matrix Multiplication (Sparse GEMM) is widely used across
multiple domains, but the computation's regularity is dependent on the input sparsity pattern …
multiple domains, but the computation's regularity is dependent on the input sparsity pattern …