A framework for memory oversubscription management in graphics processing units
C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …
management of data movement between CPU memory and GPU memory dramatically …
Tacker: Tensor-cuda core kernel fusion for improving the gpu utilization while ensuring qos
The proliferation of machine learning applications has promoted both CUDA Cores and
Tensor Cores' integration to meet their acceleration demands. While studies have shown …
Tensor Cores' integration to meet their acceleration demands. While studies have shown …
Hsm: A hybrid slowdown model for multitasking gpus
Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …
compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in …
Snake: A variable-length chain-based prefetching for gpus
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …
Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …
Navisim: A highly accurate GPU simulator for AMD RDNA GPUs
As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …
high-performance computing and machine learning, GPU architects need to deliver more …
A survey of GPU multitasking methods supported by hardware architecture
The ability to support multitasking becomes more and more important in the development of
graphic processing unit (GPU). GPU multitasking methods are classified into three types …
graphic processing unit (GPU). GPU multitasking methods are classified into three types …
Themis: Predicting and reining in application-level slowdown on spatial multitasking GPUs
Predicting performance degradation of a GPU application when it is co-located with other
applications on a spatial multitasking GPU without prior application knowledge is essential …
applications on a spatial multitasking GPU without prior application knowledge is essential …
Generic system calls for GPUs
GPUs are becoming first-class compute citizens and increasingly support programmability-
enhancing features such as shared virtual memory and hardware cache coherence. This …
enhancing features such as shared virtual memory and hardware cache coherence. This …
Analyzing and leveraging decoupled L1 caches in GPUs
MA Ibrahim, O Kayiran, Y Eckert… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) use caches to provide on-chip bandwidth as a way to
address the memory wall. However, they are not always efficiently utilized for optimal GPU …
address the memory wall. However, they are not always efficiently utilized for optimal GPU …