Analyzing and leveraging remote-core bandwidth for enhanced performance in GPUs
2019 28th International Conference on Parallel Architectures and …, 2019•ieeexplore.ieee.org
Bandwidth achieved from local/shared caches and memory is a major performance
determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are
often not enough for optimal GPU performance. Therefore, to enhance the performance
further, we focus on efficiently unlocking an additional potential source of bandwidth, which
we call as remote-core bandwidth. The source of this bandwidth is based on the observation
that a fraction of data (ie, L1 read misses) required by one GPU core can also be found in …
determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are
often not enough for optimal GPU performance. Therefore, to enhance the performance
further, we focus on efficiently unlocking an additional potential source of bandwidth, which
we call as remote-core bandwidth. The source of this bandwidth is based on the observation
that a fraction of data (ie, L1 read misses) required by one GPU core can also be found in …
Bandwidth achieved from local/shared caches and memory is a major performance determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are often not enough for optimal GPU performance. Therefore, to enhance the performance further, we focus on efficiently unlocking an additional potential source of bandwidth, which we call as remote-core bandwidth. The source of this bandwidth is based on the observation that a fraction of data (i.e., L1 read misses) required by one GPU core can also be found in the local (L1) caches of other GPU cores. In this paper, we propose to efficiently coordinate the data movement across cores in GPUs to exploit this remote-core bandwidth. However, we find that its efficient detection and utilization presents several challenges. To this end, we specifically address: a) which data is shared across cores, b) which cores have the shared data, and c) how we can get the data as soon as possible. Our extensive evaluation across a wide set of GPGPU applications shows that significant performance improvement can be achieved at a modest hardware cost on account of the additional bandwidth received from the remote cores.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果