A locality-aware memory hierarchy for energy-efficient GPU architectures
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a
bottleneck. Current GPU memory hierarchies use coarse-grained memory accesses to …
bottleneck. Current GPU memory hierarchies use coarse-grained memory accesses to …
Divergence reduction in Monte Carlo neutron transport with on-GPU asynchronous scheduling
B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org
While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …
CAWA: Coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads
SY Lee, A Arunkumar, CJ Wu - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
The ubiquity of graphics processing unit (GPU) architectures has made them efficient
alternatives to chip-multiprocessors for parallel workloads. GPUs achieve superior …
alternatives to chip-multiprocessors for parallel workloads. GPUs achieve superior …
CAWS: Criticality-aware warp scheduling for GPGPU workloads
SY Lee, CJ Wu - Proceedings of the 23rd international conference on …, 2014 - dl.acm.org
The ability to perform fast context-switching and massive multi-threading is the forte of
modern GPU architectures, which have emerged as an efficient alternative to traditional chip …
modern GPU architectures, which have emerged as an efficient alternative to traditional chip …
Accelerating divergent applications on simd architectures using neural networks
B Grigorian, G Reinman - ACM Transactions on Architecture and Code …, 2015 - dl.acm.org
The purpose of this research is to find a neural-network-based solution to the well-known
problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our …
problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our …
PATS: Pattern aware scheduling and power gating for GPGPUs
Q Xu, M Annavaram - Proceedings of the 23rd international conference …, 2014 - dl.acm.org
General purpose computing using graphics processing units (GPGPUs) is an attractive
option to achieve power efficient throughput computing. But the power efficiency of GPGPUs …
option to achieve power efficient throughput computing. But the power efficiency of GPGPUs …
ITAP: Idle-time-aware power management for GPU execution units
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …
applications with massively data-parallel tasks. However, recent studies show that GPUs …
Efficient warp execution in presence of divergence with collaborative context collection
GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …
flow divergence. On the one hand, it provides a high performance yet power-efficient …
Aurochs: An architecture for dataflow threads
Data analytics pipelines increasingly rely on databases to select, filter, and pre-process
reams of data. These databases use data structures with irregular control flow like trees and …
reams of data. These databases use data structures with irregular control flow like trees and …
A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …