Evaluation of hardware data prefetchers on server processors

P Zhang, R Kannan, A Srivastava… - … Conference for High …, 2022 - ieeexplore.ieee.org

Data prefetching hides memory latency by predicting and loading necessary data into cache
beforehand. Most prefetchers in the literature are efficient for specific memory address …

被引用次数：10 相关文章所有 5 个版本

[PDF] ipm.ac.ir

Divide and conquer frontend bottleneck

A Ansari, P Lotfi-Kamran… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

The frontend stalls caused by instruction and BTB misses are a significant source of
performance degradation in server processors. Prefetchers are commonly employed to …

被引用次数：33 相关文章所有 5 个版本

Q-zilla: A scheduling framework and core microarchitecture for tail-tolerant microservices

A Mirhosseini, BL West, GW Blake… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Managing tail latency is a primary challenge in designing large-scale Internet services.
Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are …

被引用次数：31 相关文章所有 2 个版本

[PDF] stonybrook.edu

[PDF][PDF] Multi-lookahead offset prefetching

M Shakerinava… - The Third Data …, 2019 - dpc3.compas.cs.stonybrook.edu

Offset prefetching has been recently proposed as a lowoverhead yet high-performance
approach to eliminate data cache misses or reduce their negative effect. In offset …

被引用次数：35 相关文章所有 2 个版本

[PDF] nsf.gov

BOW: Breathing operand windows to exploit bypassing in GPUs

HA Esfeden, A Abdolrashidi, S Rahman… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …

被引用次数：21 相关文章所有 11 个版本

[PDF] arxiv.org

Nesta: Hamming weight compression-based neural proc. engineali mirzaeian

A Mirzaeian, H Homayoun… - 2020 25th Asia and South …, 2020 - ieeexplore.ieee.org

In this paper, we present NESTA, a specialized Neural engine that significantly accelerates
the computation of convolution layers in a deep convolutional neural network, while …

被引用次数：18 相关文章所有 8 个版本

[PDF] acm.org

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems

Q Wu, A Ekanayake, R Li, J Beard, L John - Proceedings of the 51st …, 2022 - dl.acm.org

With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …

被引用次数：3 相关文章所有 2 个版本

Highly concurrent latency-tolerant register files for GPUs

M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org

Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

被引用次数：7 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] LSTM-CRP: Algorithm-Hardware Co-Design and Implementation of Cache Replacement Policy Using Long Short-Term Memory

Y Wang, Y Meng, J Wang, C Yang - Big Data and Cognitive Computing, 2024 - mdpi.com

As deep learning has produced dramatic breakthroughs in many areas, it has motivated
emerging studies on the combination between neural networks and cache replacement …

Harnessing pairwise-correlating data prefetching with runahead metadata

F Golshan, M Bakhshalipour… - IEEE Computer …, 2020 - ieeexplore.ieee.org

Recent research revisits pairwise-correlating data prefetching due to its extremely low
overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where …

被引用次数：9 相关文章所有 3 个版本