Vector runahead

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

被引用次数：73 相关文章所有 7 个版本

[PDF] acm.org

Crescent: taming memory irregularities for accelerating deep point cloud analytics

Y Feng, G Hammonds, Y Gan, Y Zhu - Proceedings of the 49th Annual …, 2022 - dl.acm.org

3D perception in point clouds is transforming the perception ability of future intelligent
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …

被引用次数：32 相关文章所有 7 个版本

[PDF] cam.ac.uk

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

被引用次数：8 相关文章所有 9 个版本

[PDF] arxiv.org

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …

被引用次数：21 相关文章所有 7 个版本

被引用次数：5 相关文章所有 5 个版本

[PDF] acm.org Full View

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

K Lakshminarasimhan, A Naithani, J Feliu… - ACM Transactions on …, 2022 - dl.acm.org

Superscalar out-of-order cores deliver high performance at the cost of increased complexity
and power budget. In-order cores, in contrast, are less complex and have a smaller power …

被引用次数：2 相关文章所有 8 个版本

Pythia: A customizable hardware prefetching framework using online reinforcement learning

Crescent: taming memory irregularities for accelerating deep point cloud analytics

Decoupled vector runahead

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

Tartan: Microarchitecting a Robotic Processor

Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses

Differential-Matching Prefetcher for Indirect Memory Access

Reliability-aware runahead

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

高级搜索

引用