Register allocation for intel processor graphics

WY Chen, GY Lueh, P Ashar, K Chen… - Proceedings of the 2018 …, 2018 - dl.acm.org
Register allocation is a well-studied problem, but surprisingly little work has been published
on assigning registers for GPU architectures. In this paper we present the register allocator …

Efficient warp execution in presence of divergence with collaborative context collection

F Khorasani, R Gupta, LN Bhuyan - Proceedings of the 48th International …, 2015 - dl.acm.org
GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control
flow divergence. On the one hand, it provides a high performance yet power-efficient …

A simple BSP-based model to predict execution time in GPU applications

M Amaris, D Cordeiro, A Goldman… - 2015 IEEE 22nd …, 2015 - ieeexplore.ieee.org
Models are useful to represent abstractions of software and hardware processes. The Bulk
Synchronous Parallel (BSP) is a bridging model for parallel computation that allows …

Side-channel elimination via partial control-flow linearization

L Soares, M Canesche, FMQ Pereira - ACM Transactions on …, 2023 - dl.acm.org
Partial control-flow linearization is a code transformation conceived to maximize work
performed in vectorized programs. In this article, we find a new service for it. We show that …

Igc: The open source intel graphics compiler

A Chandrasekhar, G Chen, PY Chen… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
With increasing general purpose programming capability, GPUs have become the mainstay
for a wide variety of compute intensive tasks from cloud to edge computing. Because of its …

Static placement of computation on heterogeneous devices

G Poesia, B Guimarães, F Ferracioli… - Proceedings of the ACM …, 2017 - dl.acm.org
Heterogeneous architectures characterize today hardware ranging from super-computers to
smartphones. However, in spite of this importance, programming such systems is still …

Efficient execution of graph algorithms on CPU with SIMD extensions

R Zheng, S Pai - 2021 IEEE/ACM International Symposium on …, 2021 - ieeexplore.ieee.org
Existing state-of-the-art CPU graph frameworks take advantage of multiple cores, but not the
SIMD capability within each core. In this work, we retarget an existing GPU graph algorithm …

An abstract interpretation for spmd divergence on reducible control flow graphs

J Rosemann, S Moll, S Hack - Proceedings of the ACM on Programming …, 2021 - dl.acm.org
Vectorizing compilers employ divergence analysis to detect at which program point a
specific variable is uniform, ie has the same value on all SPMD threads that execute this …

Autotuning cuda compiler parameters for heterogeneous applications using the opentuner framework

P Bruel, M Amaris, A Goldman - Concurrency and Computation …, 2017 - Wiley Online Library
Summary A Graphics Processing Unit (GPU) is a parallel computing coprocessor
specialized in accelerating vector operations. The enormous heterogeneity of parallel …

Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement

F Khorasani, B Rowe, R Gupta… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Nested patterns are one of the most frequently occurring algorithmic themes in GPU
applications where coarse-grained tasks are constituted from a number of fine-grained ones …