Pythia: A customizable hardware prefetching framework using online reinforcement learning

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

Crescent: taming memory irregularities for accelerating deep point cloud analytics

Y Feng, G Hammonds, Y Gan, Y Zhu - Proceedings of the 49th Annual …, 2022 - dl.acm.org
3D perception in point clouds is transforming the perception ability of future intelligent
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …

Tartan: Microarchitecting a Robotic Processor

M Bakhshalipour, PB Gibbons - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
This paper presents Tartan, a CPU architecture designed for a wide range of robotic
applications. Tartan provides architectural support for common robotic kernels, ensuring its …

Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

R Bera, A Ranganathan, J Rakshit… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Load instructions often limit instruction-level parallelism (ILP) in modern processors due to
data and resource dependences they cause. Prior techniques like Load Value Prediction …

Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses

F Xue, C Han, X Li, J Wu, T Zhang, T Liu… - ACM Transactions on …, 2024 - dl.acm.org
Indirect memory accesses (IMAs, ie, A [f (B [i])]) are typical memory access patterns in
applications such as graph analysis, machine learning, and database. IMAs are composed …

Differential-Matching Prefetcher for Indirect Memory Access

G Fu, T Xia, Z Luo, R Chen, W Zhao… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Indirect memory access is a critical bottleneck for modern CPUs, especially for graph
analysis and sparse linear algebra applications, where the values of one data array are …

Reliability-aware runahead

A Naithani, L Eeckhout - 2022 IEEE International Symposium …, 2022 - ieeexplore.ieee.org
Decreasing voltage levels and continued transistor scaling have drastically increased the
chance of a processor bit encountering a soft error. We find that the microarchitecture state …

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

K Lakshminarasimhan, A Naithani, J Feliu… - ACM Transactions on …, 2022 - dl.acm.org
Superscalar out-of-order cores deliver high performance at the cost of increased complexity
and power budget. In-order cores, in contrast, are less complex and have a smaller power …