Toward sustainable hpc: Carbon footprint estimation and environmental implications of hpc systems
The rapid growth in demand for HPC systems has led to a rise in carbon footprint, which
requires urgent intervention. In this work, we present a comprehensive analysis of the …
requires urgent intervention. In this work, we present a comprehensive analysis of the …
Paver: Locality graph-based thread block scheduling for gpus
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …
SAC: Sharing-aware caching in multi-chip GPUs
S Zhang, M Naderan-Tahan, M Jahre… - Proceedings of the 50th …, 2023 - dl.acm.org
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
Network-on-chip microarchitecture-based covert channel in gpus
As GPUs are becoming widely deployed in the cloud infrastructure to support different
application domains, the security concerns of GPUs are becoming increasingly important. In …
application domains, the security concerns of GPUs are becoming increasingly important. In …
IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations
Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …
capabilities and large memory capacity to accommodate large dataset sizes. However …
Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs
This article presents a graphics processing unit (GPU) scheduling scheme that maximizes
the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the …
the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the …
Localityguru: A ptx analyzer for extracting thread block-level locality in gpgpus
Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and
handling the memory bottleneck problem. This paper proposes a thread block-centric …
handling the memory bottleneck problem. This paper proposes a thread block-centric …
GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement
Multi-GPU systems have become popular to cater to the growing demands for high
parallelism and large memory capacity. However, the delivered performance is constrained …
parallelism and large memory capacity. However, the delivered performance is constrained …
AIO: An abstraction for performance analysis across diverse accelerator architectures
Specialization is the key approach for continued performance growth beyond the end of
Dennard scaling. Academics and industry are hence continuously proposing new …
Dennard scaling. Academics and industry are hence continuously proposing new …