Resemble: reinforced ensemble framework for data prefetching
Data prefetching hides memory latency by predicting and loading necessary data into cache
beforehand. Most prefetchers in the literature are efficient for specific memory address …
beforehand. Most prefetchers in the literature are efficient for specific memory address …
Divide and conquer frontend bottleneck
A Ansari, P Lotfi-Kamran… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
The frontend stalls caused by instruction and BTB misses are a significant source of
performance degradation in server processors. Prefetchers are commonly employed to …
performance degradation in server processors. Prefetchers are commonly employed to …
Q-zilla: A scheduling framework and core microarchitecture for tail-tolerant microservices
A Mirhosseini, BL West, GW Blake… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Managing tail latency is a primary challenge in designing large-scale Internet services.
Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are …
Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are …
[PDF][PDF] Multi-lookahead offset prefetching
M Shakerinava… - The Third Data …, 2019 - dpc3.compas.cs.stonybrook.edu
Offset prefetching has been recently proposed as a lowoverhead yet high-performance
approach to eliminate data cache misses or reduce their negative effect. In offset …
approach to eliminate data cache misses or reduce their negative effect. In offset …
BOW: Breathing operand windows to exploit bypassing in GPUs
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …
for a large portion of the area and power. To simplify the architecture of the RF, it is …
Nesta: Hamming weight compression-based neural proc. engineali mirzaeian
A Mirzaeian, H Homayoun… - 2020 25th Asia and South …, 2020 - ieeexplore.ieee.org
In this paper, we present NESTA, a specialized Neural engine that significantly accelerates
the computation of convolution layers in a deep convolutional neural network, while …
the computation of convolution layers in a deep convolutional neural network, while …
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems
With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …
Highly concurrent latency-tolerant register files for GPUs
M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …
threads and accelerate context switching. Unfortunately, register files are a scalability …
[HTML][HTML] LSTM-CRP: Algorithm-Hardware Co-Design and Implementation of Cache Replacement Policy Using Long Short-Term Memory
Y Wang, Y Meng, J Wang, C Yang - Big Data and Cognitive Computing, 2024 - mdpi.com
As deep learning has produced dramatic breakthroughs in many areas, it has motivated
emerging studies on the combination between neural networks and cache replacement …
emerging studies on the combination between neural networks and cache replacement …
Harnessing pairwise-correlating data prefetching with runahead metadata
F Golshan, M Bakhshalipour… - IEEE Computer …, 2020 - ieeexplore.ieee.org
Recent research revisits pairwise-correlating data prefetching due to its extremely low
overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where …
overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where …