Many-BSP: an analytical performance model for CUDA kernels
A Riahi, A Savadi, M Naghibzadeh - Computing, 2024 - Springer
The unknown behavior of GPUs and the differing characteristics among their generations
present a serious challenge in the analysis and optimization of programs in these …
present a serious challenge in the analysis and optimization of programs in these …
WSMP: a warp scheduling strategy based on MFQ and PPF
J Fang, L Zhao, M Cai, H Yang - The Journal of Supercomputing, 2023 - Springer
Normally, threads in a warp do not severely interfere with each other. However, the
scheduler must wait until all the threads within complete before scheduling the next warp …
scheduler must wait until all the threads within complete before scheduling the next warp …
A survey of GPGPU parallel processing architecture performance optimization
S Jia, Z Tian, Y Ma, C Sun, Y Zhang… - 2021 IEEE/ACIS 20th …, 2021 - ieeexplore.ieee.org
General purpose graphic processor unit (GPGPU) supports various applications' execution
in different fields with high-performance computing capability due to its powerful parallel …
in different fields with high-performance computing capability due to its powerful parallel …
Warp-Aware Adaptive Energy Efficiency Calibration for Multi-GPU Systems
Z Wang, X Song, L Cheng, H Wan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Massive GPU acceleration processors have been used in high-performance computing
systems. The Dennard scaling has led to power and thermal constraints limiting the …
systems. The Dennard scaling has led to power and thermal constraints limiting the …