Efficient and fair multi-programming in GPUs via effective bandwidth management

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

被引用次数：99 相关文章所有 10 个版本

[PDF] github.io

Tacker: Tensor-cuda core kernel fusion for improving the gpu utilization while ensuring qos

H Zhao, W Cui, Q Chen, Y Zhang, Y Lu… - … Symposium on High …, 2022 - ieeexplore.ieee.org

The proliferation of machine learning applications has promoted both CUDA Cores and
Tensor Cores' integration to meet their acceleration demands. While studies have shown …

被引用次数：25 相关文章所有 3 个版本

[PDF] ugent.be

Hsm: A hybrid slowdown model for multitasking gpus

X Zhao, M Jahre, L Eeckhout - … of the twenty-fifth international conference …, 2020 - dl.acm.org

Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …

被引用次数：40 相关文章所有 8 个版本

[PDF] researchgate.net

Snake: A variable-length chain-based prefetching for gpus

S Mostofi, H Falahati, N Mahani… - Proceedings of the 56th …, 2023 - dl.acm.org

Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

被引用次数：15 相关文章所有 6 个版本

[PDF] github.io

Navisim: A highly accurate GPU simulator for AMD RDNA GPUs

Y Bao, Y Sun, Z Feric, MT Shen, M Weston… - Proceedings of the …, 2022 - dl.acm.org

As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …

被引用次数：14 相关文章所有 6 个版本

A survey of GPU multitasking methods supported by hardware architecture

C Zhao, W Gao, F Nie, H Zhou - IEEE Transactions on Parallel …, 2021 - ieeexplore.ieee.org

The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …

被引用次数：18 相关文章所有 2 个版本

[PDF] sjtu.edu.cn

Themis: Predicting and reining in application-level slowdown on spatial multitasking GPUs

W Zhao, Q Chen, H Lin, J Zhang, J Leng… - 2019 IEEE …, 2019 - ieeexplore.ieee.org

Predicting performance degradation of a GPU application when it is co-located with other
applications on a spatial multitasking GPU without prior application knowledge is essential …

被引用次数：32 相关文章所有 6 个版本

[PDF] yale.edu

Generic system calls for GPUs

J Veselý, A Basu, A Bhattacharjee… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org

GPUs are becoming first-class compute citizens and increasingly support programmability-
enhancing features such as shared virtual memory and hardware cache coherence. This …

被引用次数：39 相关文章所有 13 个版本

[PDF] nsf.gov

Analyzing and leveraging decoupled L1 caches in GPUs

MA Ibrahim, O Kayiran, Y Eckert… - … Symposium on High …, 2021 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) use caches to provide on-chip bandwidth as a way to
address the memory wall. However, they are not always efficiently utilized for optimal GPU …

被引用次数：20 相关文章所有 10 个版本