A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

Tacker: Tensor-cuda core kernel fusion for improving the gpu utilization while ensuring qos

H Zhao, W Cui, Q Chen, Y Zhang, Y Lu… - … Symposium on High …, 2022 - ieeexplore.ieee.org
The proliferation of machine learning applications has promoted both CUDA Cores and
Tensor Cores' integration to meet their acceleration demands. While studies have shown …

Hsm: A hybrid slowdown model for multitasking gpus

X Zhao, M Jahre, L Eeckhout - … of the twenty-fifth international conference …, 2020 - dl.acm.org
Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …

Snake: A variable-length chain-based prefetching for gpus

S Mostofi, H Falahati, N Mahani… - Proceedings of the 56th …, 2023 - dl.acm.org
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

Navisim: A highly accurate GPU simulator for AMD RDNA GPUs

Y Bao, Y Sun, Z Feric, MT Shen, M Weston… - Proceedings of the …, 2022 - dl.acm.org
As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …

A survey of GPU multitasking methods supported by hardware architecture

C Zhao, W Gao, F Nie, H Zhou - IEEE Transactions on Parallel …, 2021 - ieeexplore.ieee.org
The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …

Themis: Predicting and reining in application-level slowdown on spatial multitasking GPUs

W Zhao, Q Chen, H Lin, J Zhang, J Leng… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Predicting performance degradation of a GPU application when it is co-located with other
applications on a spatial multitasking GPU without prior application knowledge is essential …

Generic system calls for GPUs

J Veselý, A Basu, A Bhattacharjee… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
GPUs are becoming first-class compute citizens and increasingly support programmability-
enhancing features such as shared virtual memory and hardware cache coherence. This …

Analyzing and leveraging decoupled L1 caches in GPUs

MA Ibrahim, O Kayiran, Y Eckert… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) use caches to provide on-chip bandwidth as a way to
address the memory wall. However, they are not always efficiently utilized for optimal GPU …