Snake: A variable-length chain-based prefetching for gpus

S Mostofi, H Falahati, N Mahani… - Proceedings of the 56th …, 2023 - dl.acm.org
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

CoMeT: An integrated interval thermal simulation toolchain for 2D, 2.5 D, and 3D processor-memory systems

L Siddhu, R Kedia, S Pandey, M Rapp… - ACM Transactions on …, 2022 - dl.acm.org
Processing cores and the accompanying main memory working in tandem enable modern
processors. Dissipating heat produced from computation remains a significant problem for …

Corf: Coalescing operand register file for gpus

H Asghari Esfeden, F Khorasani, H Jeon… - Proceedings of the …, 2019 - dl.acm.org
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …

TAMA: turn-aware mapping and architecture–a power-efficient network-on-chip approach

R Aligholipour, M Baharloo, B Farzaneh… - ACM Transactions on …, 2021 - dl.acm.org
Nowadays, static power consumption in chip multiprocessor (CMP) is the most crucial
concern of chip designers. Power-gating is an effective approach to mitigate static power …

BOW: Breathing operand windows to exploit bypassing in GPUs

HA Esfeden, A Abdolrashidi, S Rahman… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …

OSM: Off-chip shared memory for GPUs

S Darabi, E Yousefzadeh-Asl-Miandoab… - … on Parallel and …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for
programmers, in each streaming multiprocessor to accelerate data sharing among the …

PTTS: Power-aware tensor cores using two-sided sparsity

E Atoofian - Journal of Parallel and Distributed Computing, 2023 - Elsevier
Abstract Deep Neural networks (DNNs) have become the compelling solution for a broad
range of applications such as automatic translation, advertisement recommendation, and …

QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU

Q Sun, L Yi, H Yang, M Li, Z Luan, D Qian - Parallel Computing, 2022 - Elsevier
Although GPUs have been indispensable in data centers, meeting the Quality of Service
(QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely …

TREFU: An Online Error Detecting and Correcting Fault Tolerant GPGPU Architecture

KK Raghunandana, V BKSVL… - 2023 IEEE 29th …, 2023 - ieeexplore.ieee.org
General Purpose Graphics Processing Units (GPGPUs) are extensively used in high-
performance applications/systems, whose execution times may vary from a few days to …