Resemble: reinforced ensemble framework for data prefetching

P Zhang, R Kannan, A Srivastava… - … Conference for High …, 2022 - ieeexplore.ieee.org
Data prefetching hides memory latency by predicting and loading necessary data into cache
beforehand. Most prefetchers in the literature are efficient for specific memory address …

Divide and conquer frontend bottleneck

A Ansari, P Lotfi-Kamran… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
The frontend stalls caused by instruction and BTB misses are a significant source of
performance degradation in server processors. Prefetchers are commonly employed to …

Q-zilla: A scheduling framework and core microarchitecture for tail-tolerant microservices

A Mirhosseini, BL West, GW Blake… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Managing tail latency is a primary challenge in designing large-scale Internet services.
Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are …

[PDF][PDF] Multi-lookahead offset prefetching

M Shakerinava… - The Third Data …, 2019 - dpc3.compas.cs.stonybrook.edu
Offset prefetching has been recently proposed as a lowoverhead yet high-performance
approach to eliminate data cache misses or reduce their negative effect. In offset …

BOW: Breathing operand windows to exploit bypassing in GPUs

HA Esfeden, A Abdolrashidi, S Rahman… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …

Nesta: Hamming weight compression-based neural proc. engineali mirzaeian

A Mirzaeian, H Homayoun… - 2020 25th Asia and South …, 2020 - ieeexplore.ieee.org
In this paper, we present NESTA, a specialized Neural engine that significantly accelerates
the computation of convolution layers in a deep convolutional neural network, while …

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems

Q Wu, A Ekanayake, R Li, J Beard, L John - Proceedings of the 51st …, 2022 - dl.acm.org
With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …

Highly concurrent latency-tolerant register files for GPUs

M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

[HTML][HTML] LSTM-CRP: Algorithm-Hardware Co-Design and Implementation of Cache Replacement Policy Using Long Short-Term Memory

Y Wang, Y Meng, J Wang, C Yang - Big Data and Cognitive Computing, 2024 - mdpi.com
As deep learning has produced dramatic breakthroughs in many areas, it has motivated
emerging studies on the combination between neural networks and cache replacement …

Harnessing pairwise-correlating data prefetching with runahead metadata

F Golshan, M Bakhshalipour… - IEEE Computer …, 2020 - ieeexplore.ieee.org
Recent research revisits pairwise-correlating data prefetching due to its extremely low
overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where …