Locality-centric data and threadblock management for massive GPUs

B Li, R Basu Roy, D Wang, S Samsi… - Proceedings of the …, 2023 - dl.acm.org

The rapid growth in demand for HPC systems has led to a rise in carbon footprint, which
requires urgent intervention. In this work, we present a comprehensive analysis of the …

被引用次数：34 相关文章所有 5 个版本

[PDF] acm.org Full View

Paver: Locality graph-based thread block scheduling for gpus

D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on …, 2021 - dl.acm.org

The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache
sizes per thread, leading to serious cache contention problems such as thrashing. Hence …

被引用次数：34 相关文章所有 6 个版本

[PDF] ugent.be

SAC: Sharing-aware caching in multi-chip GPUs

S Zhang, M Naderan-Tahan, M Jahre… - Proceedings of the 50th …, 2023 - dl.acm.org

Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …

被引用次数：10 相关文章所有 4 个版本

[PDF] bu.edu

Network-on-chip microarchitecture-based covert channel in gpus

J Ahn, J Kim, H Kasan, L Delshadtehrani… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

As GPUs are becoming widely deployed in the cloud infrastructure to support different
application domains, the security concerns of GPUs are becoming increasingly important. In …

被引用次数：29 相关文章所有 3 个版本

[PDF] acm.org

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations

B Li, Y Guo, Y Wang, A Jaleel, J Yang… - Proceedings of the 56th …, 2023 - dl.acm.org

Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …

被引用次数：7 相关文章所有 8 个版本

[PDF] github.io

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Y Feng, S Na, H Kim, H Jeon - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org

With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …

被引用次数：2 相关文章所有 4 个版本

Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs

H Kim, WJ Song - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

This article presents a graphics processing unit (GPU) scheduling scheme that maximizes
the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the …

被引用次数：8 相关文章所有 4 个版本

[PDF] ucr.edu

Localityguru: A ptx analyzer for extracting thread block-level locality in gpgpus

D Tripathy, A Abdolrashidi, Q Fan… - … and Storage (NAS), 2021 - ieeexplore.ieee.org

Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and
handling the memory bottleneck problem. This paper proposes a thread block-centric …

被引用次数：15 相关文章所有 5 个版本

[PDF] github.io

GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement

Y Wang, B Li, A Jaleel, J Yang… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Multi-GPU systems have become popular to cater to the growing demands for high
parallelism and large memory capacity. However, the delivered performance is constrained …

被引用次数：5 相关文章所有 5 个版本

AIO: An abstraction for performance analysis across diverse accelerator architectures

J Rogers, T Soliman, M Jahre - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org

Specialization is the key approach for continued performance growth beyond the end of
Dennard scaling. Academics and industry are hence continuously proposing new …

被引用次数：1 相关文章所有 3 个版本