SIMD divergence optimization through intra-warp compaction

M Rhu, M Sullivan, J Leng, M Erez - … of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org

As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a
bottleneck. Current GPU memory hierarchies use coarse-grained memory accesses to …

被引用次数：170 相关文章所有 16 个版本

[PDF] acm.org

Divergence reduction in Monte Carlo neutron transport with on-GPU asynchronous scheduling

B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org

While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …

被引用次数：3 相关文章

CAWA: Coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads

SY Lee, A Arunkumar, CJ Wu - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org

The ubiquity of graphics processing unit (GPU) architectures has made them efficient
alternatives to chip-multiprocessors for parallel workloads. GPUs achieve superior …

被引用次数：124 相关文章所有 7 个版本

[PDF] psu.edu

CAWS: Criticality-aware warp scheduling for GPGPU workloads

SY Lee, CJ Wu - Proceedings of the 23rd international conference on …, 2014 - dl.acm.org

The ability to perform fast context-switching and massive multi-threading is the forte of
modern GPU architectures, which have emerged as an efficient alternative to traditional chip …

被引用次数：108 相关文章所有 7 个版本

[PDF] acm.org

Accelerating divergent applications on simd architectures using neural networks

B Grigorian, G Reinman - ACM Transactions on Architecture and Code …, 2015 - dl.acm.org

The purpose of this research is to find a neural-network-based solution to the well-known
problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our …

被引用次数：66 相关文章所有 4 个版本

[PDF] archive.org

PATS: Pattern aware scheduling and power gating for GPGPUs

Q Xu, M Annavaram - Proceedings of the 23rd international conference …, 2014 - dl.acm.org

General purpose computing using graphics processing units (GPGPUs) is an attractive
option to achieve power efficient throughput computing. But the power efficiency of GPGPUs …

被引用次数：66 相关文章所有 4 个版本

[PDF] acm.org Full View

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org

Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

被引用次数：34 相关文章所有 6 个版本

[PDF] acm.org

Efficient warp execution in presence of divergence with collaborative context collection

F Khorasani, R Gupta, LN Bhuyan - Proceedings of the 48th International …, 2015 - dl.acm.org

GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …

被引用次数：45 相关文章所有 8 个版本

Aurochs: An architecture for dataflow threads

M Vilim, A Rucker, K Olukotun - 2021 ACM/IEEE 48th Annual …, 2021 - ieeexplore.ieee.org

Data analytics pipelines increasingly rely on databases to select, filter, and pre-process
reams of data. These databases use data structures with irregular control flow like trees and …

被引用次数：17 相关文章所有 3 个版本

[PDF] github.io

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier

With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …

被引用次数：32 相关文章所有 4 个版本