Toward sustainable hpc: Carbon footprint estimation and environmental implications of hpc systems

B Li, R Basu Roy, D Wang, S Samsi… - Proceedings of the …, 2023 - dl.acm.org
The rapid growth in demand for HPC systems has led to a rise in carbon footprint, which
requires urgent intervention. In this work, we present a comprehensive analysis of the …

Paver: Locality graph-based thread block scheduling for gpus

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021 - dl.acm.org
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …

SAC: Sharing-aware caching in multi-chip GPUs

S Zhang, M Naderan-Tahan, M Jahre… - Proceedings of the 50th …, 2023 - dl.acm.org
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …

Network-on-chip microarchitecture-based covert channel in gpus

J Ahn, J Kim, H Kasan, L Delshadtehrani… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
As GPUs are becoming widely deployed in the cloud infrastructure to support different
application domains, the security concerns of GPUs are becoming increasingly important. In …

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations

B Li, Y Guo, Y Wang, A Jaleel, J Yang… - Proceedings of the 56th …, 2023 - dl.acm.org
Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Y Feng, S Na, H Kim, H Jeon - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …

Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs

H Kim, WJ Song - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
This article presents a graphics processing unit (GPU) scheduling scheme that maximizes
the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the …

Localityguru: A ptx analyzer for extracting thread block-level locality in gpgpus

D Tripathy, A Abdolrashidi, Q Fan… - … and Storage (NAS), 2021 - ieeexplore.ieee.org
Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and
handling the memory bottleneck problem. This paper proposes a thread block-centric …

GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement

Y Wang, B Li, A Jaleel, J Yang… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Multi-GPU systems have become popular to cater to the growing demands for high
parallelism and large memory capacity. However, the delivered performance is constrained …

AIO: An abstraction for performance analysis across diverse accelerator architectures

J Rogers, T Soliman, M Jahre - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
Specialization is the key approach for continued performance growth beyond the end of
Dennard scaling. Academics and industry are hence continuously proposing new …