Coda: Enabling co-location of computation and data for multiple gpu systems

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

被引用次数：106 相关文章所有 10 个版本

Griffin: Hardware-software support for efficient page migration in multi-gpu systems

T Baruah, Y Sun, AT Dinçer… - … Symposium on High …, 2020 - ieeexplore.ieee.org

As transistor scaling becomes increasingly more difficult to achieve, scaling the core count
on a single GPU chip has also become extremely challenging. As the volume of data to …

被引用次数：50 相关文章所有 4 个版本

[PDF] nsf.gov

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org

Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

被引用次数：34 相关文章所有 8 个版本

[PDF] arxiv.org

Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems

SB Dutta, H Naghibijouybari, A Gupta… - Proceedings of the 50th …, 2023 - dl.acm.org

The deep learning revolution has been enabled in large part by GPUs, and more recently
accelerators, which make it possible to carry out computationally demanding training and …

被引用次数：25 相关文章所有 6 个版本

[PDF] github.io

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Y Feng, S Na, H Kim, H Jeon - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org

With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …

被引用次数：1 相关文章所有 4 个版本

[PDF] github.io

Charon: Specialized near-memory processing architecture for clearing dead objects in memory

J Jang, J Heo, Y Lee, J Won, S Kim, SJ Jung… - Proceedings of the …, 2019 - dl.acm.org

Garbage collection (GC) is a standard feature for high productivity programming, saving a
programmer from many nasty memory-related bugs. However, these productivity benefits …

被引用次数：26 相关文章所有 4 个版本

[PDF] ucr.edu

Localityguru: A ptx analyzer for extracting thread block-level locality in gpgpus

D Tripathy, A Abdolrashidi, Q Fan… - … and Storage (NAS), 2021 - ieeexplore.ieee.org

Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and
handling the memory bottleneck problem. This paper proposes a thread block-centric …

被引用次数：15 相关文章所有 5 个版本

[PDF] wisc.edu

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

被引用次数：1 相关文章所有 7 个版本

[PDF] iisc.ac.in

Designing virtual memory system of mcm gpus

B Pratheek, N Jawalkar, A Basu - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Multi-Chip Module (MCM) designs have emerged as a key technique to scale up a GPU's
compute capabilities in the face of slowing transistor technology. However, the …

被引用次数：10 相关文章所有 4 个版本

[PDF] ncsu.edu

Salus: Efficient Security Support for CXL-Expanded GPU Memory

R Abdullah, H Lee, H Zhou… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

GPUs have become indispensable accelerators for many data-intensive applications such
as scientific workloads, deep learning models, and graph analytics; these applications share …

被引用次数：2 相关文章所有 4 个版本