Divergence analysis

WY Chen, GY Lueh, P Ashar, K Chen… - Proceedings of the 2018 …, 2018 - dl.acm.org

Register allocation is a well-studied problem, but surprisingly little work has been published
on assigning registers for GPU architectures. In this paper we present the register allocator …

被引用次数：35 相关文章

[PDF] acm.org

Efficient warp execution in presence of divergence with collaborative context collection

F Khorasani, R Gupta, LN Bhuyan - Proceedings of the 48th International …, 2015 - dl.acm.org

GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …

被引用次数：45 相关文章所有 8 个版本

[PDF] psu.edu

A simple BSP-based model to predict execution time in GPU applications

M Amaris, D Cordeiro, A Goldman… - 2015 IEEE 22nd …, 2015 - ieeexplore.ieee.org

Models are useful to represent abstractions of software and hardware processes. The Bulk
Synchronous Parallel (BSP) is a bridging model for parallel computation that allows …

被引用次数：38 相关文章所有 9 个版本

[PDF] acm.org Full View

Side-channel elimination via partial control-flow linearization

L Soares, M Canesche, FMQ Pereira - ACM Transactions on …, 2023 - dl.acm.org

Partial control-flow linearization is a code transformation conceived to maximize work
performed in vectorized programs. In this article, we find a new service for it. We show that …

被引用次数：8 相关文章所有 4 个版本

Igc: The open source intel graphics compiler

A Chandrasekhar, G Chen, PY Chen… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

With increasing general purpose programming capability, GPUs have become the mainstay
for a wide variety of compute intensive tasks from cloud to edge computing. Because of its …

被引用次数：21 相关文章所有 3 个版本

[PDF] acm.org

Static placement of computation on heterogeneous devices

G Poesia, B Guimarães, F Ferracioli… - Proceedings of the ACM …, 2017 - dl.acm.org

Heterogeneous architectures characterize today hardware ranging from super-computers to
smartphones. However, in spite of this importance, programming such systems is still …

被引用次数：23 相关文章所有 4 个版本

[PDF] rochester.edu

Efficient execution of graph algorithms on CPU with SIMD extensions

R Zheng, S Pai - 2021 IEEE/ACM International Symposium on …, 2021 - ieeexplore.ieee.org

Existing state-of-the-art CPU graph frameworks take advantage of multiple cores, but not the
SIMD capability within each core. In this work, we retarget an existing GPU graph algorithm …

被引用次数：9 相关文章所有 6 个版本

[PDF] acm.org

An abstract interpretation for spmd divergence on reducible control flow graphs

J Rosemann, S Moll, S Hack - Proceedings of the ACM on Programming …, 2021 - dl.acm.org

Vectorizing compilers employ divergence analysis to detect at which program point a
specific variable is uniform, ie has the same value on all SPMD threads that execute this …

被引用次数：8 相关文章所有 6 个版本

[PDF] academia.edu

Autotuning cuda compiler parameters for heterogeneous applications using the opentuner framework

P Bruel, M Amaris, A Goldman - Concurrency and Computation …, 2017 - Wiley Online Library

Summary A Graphics Processing Unit (GPU) is a parallel computing coprocessor
specialized in accelerating vector operations. The enormous heterogeneity of parallel …

被引用次数：21 相关文章所有 6 个版本

[PDF] ucr.edu

Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement

F Khorasani, B Rowe, R Gupta… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Nested patterns are one of the most frequently occurring algorithmic themes in GPU
applications where coarse-grained tasks are constituted from a number of fine-grained ones …

被引用次数：19 相关文章所有 5 个版本

Register allocation for intel processor graphics

Efficient warp execution in presence of divergence with collaborative context collection

A simple BSP-based model to predict execution time in GPU applications

Side-channel elimination via partial control-flow linearization

Igc: The open source intel graphics compiler

Static placement of computation on heterogeneous devices

Efficient execution of graph algorithms on CPU with SIMD extensions

An abstract interpretation for spmd divergence on reducible control flow graphs

Autotuning cuda compiler parameters for heterogeneous applications using the opentuner framework

Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement

高级搜索

引用