Snake: A variable-length chain-based prefetching for gpus
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …
Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …
CoMeT: An integrated interval thermal simulation toolchain for 2D, 2.5 D, and 3D processor-memory systems
Processing cores and the accompanying main memory working in tandem enable modern
processors. Dissipating heat produced from computation remains a significant problem for …
processors. Dissipating heat produced from computation remains a significant problem for …
Corf: Coalescing operand register file for gpus
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of
threads that support the GPU processing model. The RF organization substantially affects …
threads that support the GPU processing model. The RF organization substantially affects …
TAMA: turn-aware mapping and architecture–a power-efficient network-on-chip approach
Nowadays, static power consumption in chip multiprocessor (CMP) is the most crucial
concern of chip designers. Power-gating is an effective approach to mitigate static power …
concern of chip designers. Power-gating is an effective approach to mitigate static power …
BOW: Breathing operand windows to exploit bypassing in GPUs
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …
for a large portion of the area and power. To simplify the architecture of the RF, it is …
OSM: Off-chip shared memory for GPUs
S Darabi, E Yousefzadeh-Asl-Miandoab… - … on Parallel and …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for
programmers, in each streaming multiprocessor to accelerate data sharing among the …
programmers, in each streaming multiprocessor to accelerate data sharing among the …
PTTS: Power-aware tensor cores using two-sided sparsity
E Atoofian - Journal of Parallel and Distributed Computing, 2023 - Elsevier
Abstract Deep Neural networks (DNNs) have become the compelling solution for a broad
range of applications such as automatic translation, advertisement recommendation, and …
range of applications such as automatic translation, advertisement recommendation, and …
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU
Although GPUs have been indispensable in data centers, meeting the Quality of Service
(QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely …
(QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely …
TREFU: An Online Error Detecting and Correcting Fault Tolerant GPGPU Architecture
KK Raghunandana, V BKSVL… - 2023 IEEE 29th …, 2023 - ieeexplore.ieee.org
General Purpose Graphics Processing Units (GPGPUs) are extensively used in high-
performance applications/systems, whose execution times may vary from a few days to …
performance applications/systems, whose execution times may vary from a few days to …