A locality-aware memory hierarchy for energy-efficient GPU architectures

M Rhu, M Sullivan, J Leng, M Erez - … of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a
bottleneck. Current GPU memory hierarchies use coarse-grained memory accesses to …

Divergence reduction in Monte Carlo neutron transport with on-GPU asynchronous scheduling

B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org
While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …

CAWA: Coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads

SY Lee, A Arunkumar, CJ Wu - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
The ubiquity of graphics processing unit (GPU) architectures has made them efficient
alternatives to chip-multiprocessors for parallel workloads. GPUs achieve superior …

CAWS: Criticality-aware warp scheduling for GPGPU workloads

SY Lee, CJ Wu - Proceedings of the 23rd international conference on …, 2014 - dl.acm.org
The ability to perform fast context-switching and massive multi-threading is the forte of
modern GPU architectures, which have emerged as an efficient alternative to traditional chip …

Accelerating divergent applications on simd architectures using neural networks

B Grigorian, G Reinman - ACM Transactions on Architecture and Code …, 2015 - dl.acm.org
The purpose of this research is to find a neural-network-based solution to the well-known
problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our …

PATS: Pattern aware scheduling and power gating for GPGPUs

Q Xu, M Annavaram - Proceedings of the 23rd international conference …, 2014 - dl.acm.org
General purpose computing using graphics processing units (GPGPUs) is an attractive
option to achieve power efficient throughput computing. But the power efficiency of GPGPUs …

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

Efficient warp execution in presence of divergence with collaborative context collection

F Khorasani, R Gupta, LN Bhuyan - Proceedings of the 48th International …, 2015 - dl.acm.org
GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …

Aurochs: An architecture for dataflow threads

M Vilim, A Rucker, K Olukotun - 2021 ACM/IEEE 48th Annual …, 2021 - ieeexplore.ieee.org
Data analytics pipelines increasingly rely on databases to select, filter, and pre-process
reams of data. These databases use data structures with irregular control flow like trees and …

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …