Pythia: A customizable hardware prefetching framework using online reinforcement learning

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

A hierarchical neural model of data prefetching

Z Shi, A Jain, K Swersky, M Hashemi… - Proceedings of the 26th …, 2021 - dl.acm.org
This paper presents Voyager, a novel neural network for data prefetching. Unlike previous
neural models for prefetching, which are limited to learning delta correlations, our model can …

The championship simulator: Architectural simulation for education and competition

N Gober, G Chacon, L Wang, PV Gratz… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent years have seen a dramatic increase in the microarchitectural complexity of
processors. This increase in complexity presents a twofold challenge for the field of …

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

Evaluation of hardware data prefetchers on server processors

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

A survey on pcm lifetime enhancement schemes

S Rashidi, M Jalili, H Sarbazi-Azad - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Phase Change Memory (PCM) is an emerging memory technology that has the capability to
address the growing demand for memory capacity and bridge the gap between the main …

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …

APT-GET: profile-guided timely software prefetching

S Jamilan, TA Khan, G Ayers, B Kasikci… - Proceedings of the …, 2022 - dl.acm.org
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …

Spaghetti: Streaming accelerators for highly sparse gemm on fpgas

R Hojabr, A Sedaghati, A Sharifian… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Generalized Sparse Matrix-Matrix Multiplication (Sparse GEMM) is widely used across
multiple domains, but the computation's regularity is dependent on the input sparsity pattern …