Data prefetch mechanisms

SP Vanderwiel, DJ Lilja - ACM Computing Surveys (CSUR), 2000 - dl.acm.org
The expanding gap between microprocessor and DRAM performance has necessitated the
use of increasingly aggressive techniques designed to reduce or hide the latency of main …

Canvas: Isolated and adaptive swapping for {Multi-Applications} on remote memory

C Wang, Y Qiao, H Ma, S Liu, W Chen… - … USENIX Symposium on …, 2023 - usenix.org
Remote memory techniques for datacenter applications have recently gained a great deal of
popularity. Existing remote memory techniques focus on the efficiency of a single application …

[图书][B] Data access and storage management for embedded programmable processors

F Catthoor, K Danckaert - 2002 - books.google.com
Data Access and Storage Management for Embedded Programmable Processors gives an
overview of the state-of-the-art in system-level data access and storage management for …

Improving docker registry design based on production workload analysis

A Anwar, M Mohamed, V Tarasov, M Littley… - … USENIX Conference on …, 2018 - usenix.org
Containers offer an efficient way to run workloads as independent microservices that can be
developed, tested and deployed in an agile manner. To facilitate this process, container …

Fine-grained address segmentation for attention-based variable-degree prefetching

P Zhang, A Srivastava, AV Nori, R Kannan… - Proceedings of the 19th …, 2022 - dl.acm.org
Machine learning algorithms have shown potential to improve prefetching performance by
accurately predicting future memory accesses. Existing approaches are based on the …

Scope-aware data cache analysis for WCET estimation

BK Huynh, L Ju, A Roychoudhury - 2011 17th IEEE Real-Time …, 2011 - ieeexplore.ieee.org
Caches are widely used in modern computer systems to bridge the increasing gap between
processor speed and memory access time. On the other hand, presence of caches …

Snake: A variable-length chain-based prefetching for gpus

S Mostofi, H Falahati, N Mahani… - Proceedings of the 56th …, 2023 - dl.acm.org
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …

Resemble: reinforced ensemble framework for data prefetching

P Zhang, R Kannan, A Srivastava… - … Conference for High …, 2022 - ieeexplore.ieee.org
Data prefetching hides memory latency by predicting and loading necessary data into cache
beforehand. Most prefetchers in the literature are efficient for specific memory address …

Raop: Recurrent neural network augmented offset prefetcher

P Zhang, A Srivastava, B Brooks, R Kannan… - Proceedings of the …, 2020 - dl.acm.org
The rapid development of Big Data coupled with slowing down of Moore's law has made the
memory performance a bottleneck in the von Neumann architecture. Machine learning has …

Sharp: Software hint-assisted memory access prediction for graph analytics

P Zhang, R Kannan, X Tong, AV Nori… - 2022 IEEE High …, 2022 - ieeexplore.ieee.org
Memory system performance is a major bottleneck in large-scale graph analytics. Data
prefetching can hide memory latency; this relies on accurate prediction of memory accesses …