Pythia: A customizable hardware prefetching framework using online reinforcement learning
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …
on exploiting one specific type of program context information (eg, program counter …
Crescent: taming memory irregularities for accelerating deep point cloud analytics
3D perception in point clouds is transforming the perception ability of future intelligent
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …
Decoupled vector runahead
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …
executing separately to the main application thread, that exploits massive amounts of …
Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction
R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …
processors. To increase the latency tolerance of a processor, architects have primarily relied …
Tartan: Microarchitecting a Robotic Processor
M Bakhshalipour, PB Gibbons - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
This paper presents Tartan, a CPU architecture designed for a wide range of robotic
applications. Tartan provides architectural support for common robotic kernels, ensuring its …
applications. Tartan provides architectural support for common robotic kernels, ensuring its …
Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution
Load instructions often limit instruction-level parallelism (ILP) in modern processors due to
data and resource dependences they cause. Prior techniques like Load Value Prediction …
data and resource dependences they cause. Prior techniques like Load Value Prediction …
Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses
F Xue, C Han, X Li, J Wu, T Zhang, T Liu… - ACM Transactions on …, 2024 - dl.acm.org
Indirect memory accesses (IMAs, ie, A [f (B [i])]) are typical memory access patterns in
applications such as graph analysis, machine learning, and database. IMAs are composed …
applications such as graph analysis, machine learning, and database. IMAs are composed …
Differential-Matching Prefetcher for Indirect Memory Access
Indirect memory access is a critical bottleneck for modern CPUs, especially for graph
analysis and sparse linear algebra applications, where the values of one data array are …
analysis and sparse linear algebra applications, where the values of one data array are …
Reliability-aware runahead
A Naithani, L Eeckhout - 2022 IEEE International Symposium …, 2022 - ieeexplore.ieee.org
Decreasing voltage levels and continued transistor scaling have drastically increased the
chance of a processor bit encountering a soft error. We find that the microarchitecture state …
chance of a processor bit encountering a soft error. We find that the microarchitecture state …
The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture
Superscalar out-of-order cores deliver high performance at the cost of increased complexity
and power budget. In-order cores, in contrast, are less complex and have a smaller power …
and power budget. In-order cores, in contrast, are less complex and have a smaller power …