A survey of recent prefetching techniques for processor caches

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

Towards high performance paged memory for GPUs

T Zheng, D Nellans, A Zulfiqar… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Despite industrial investment in both on-die GPUs and next generation interconnects, the
highest performing parallel accelerators shipping today continue to be discrete GPUs …

Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

CK Luk - Proceedings of the 28th annual international …, 2001 - dl.acm.org
Hardly predictable data addresses in many irregular applications have rendered prefetching
ineffective. In many cases, the only accurate way to predict these addresses is to directly …

Accelerating dependent cache misses with an enhanced memory controller

M Hashemi, Khubaib, E Ebrahimi, O Mutlu… - ACM SIGARCH …, 2016 - dl.acm.org
On-chip contention increases memory access latency for multicore processors. We identify
that this additional latency has a substantial efect on performance for an important class of …

Domino temporal data prefetcher

M Bakhshalipour, P Lotfi-Kamran… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Big-data server applications frequently encounter data misses, and hence, lose significant
performance potential. One way to reduce the number of data misses or their effect is data …

Continuous runahead: Transparent hardware acceleration for memory intensive workloads

M Hashemi, O Mutlu, YN Patt - 2016 49th Annual IEEE/ACM …, 2016 - ieeexplore.ieee.org
Runahead execution pre-executes the application's own code to generate new cache
misses. This pre-execution results in prefetch requests that are overwhelmingly accurate …

Analysis and optimization of the memory hierarchy for graph processing workloads

A Basak, S Li, X Hu, SM Oh, X Xie… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Graph processing is an important analysis technique for a wide range of big data
applications. The ability to explicitly represent relationships between entities gives graph …

Dynamic speculative precomputation

JD Collins, DM Tullsen, H Wang… - Proceedings. 34th ACM …, 2001 - ieeexplore.ieee.org
A large number of memory accesses in memory-bound applications are irregular, such as
pointer dereferences, and can be effectively targeted by thread-based prefetching …