Divergence analysis

D Sampaio, RM Souza, C Collange… - ACM Transactions on …, 2014 - dl.acm.org
Growing interest in graphics processing units has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …

Unified on-chip memory allocation for SIMT architecture

AB Hayes, EZ Zhang - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
The popularity of general purpose Graphic Processing Unit (GPU) is largely attributed to the
tremendous concurrency enabled by its underlying architecture--single instruction multiple …

Pointer-based divergence analysis for OpenCL 2.0 programs

SC Wang, LY Yu, LA Her, YS Hwang… - ACM Transactions on …, 2021 - dl.acm.org
A modern GPU is designed with many large thread groups to achieve a high throughput and
performance. Within these groups, the threads are grouped into fixed-size SIMD batches in …

Orion: A framework for gpu occupancy tuning

AB Hayes, L Li, D Chavarría-Miranda, SL Song… - Proceedings of the 17th …, 2016 - dl.acm.org
An important feature of modern GPU architectures is variable occupancy. Occupancy
measures the ratio between the actual number of threads actively running on a GPU and the …

RegDem: Increasing GPU performance via shared memory register spilling

P Sakdhnagool, A Sabne, R Eigenmann - arXiv preprint arXiv:1907.02894, 2019 - arxiv.org
GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage
of on-chip resources, such as registers and the programmer-managed shared memory …

A GPU binary analysis framework for memory performance and safety

AB Hayes - 2022 - search.proquest.com
Abstract General-Purpose Graphics Processing Units (GPUs) have attained popularity for
their massive concurrency. But with their relative infancy, as well as a prevalence of closed …

[PDF][PDF] GPU Divergence: Analysis and Register Allocation

DN Sampaio - 2013 - inria.hal.science
The use of graphics processing units (GPUs) for accelerating Data Parallel workloads is the
new trend on the computing market. This growing interest brought renewed attention to the …

Algoritmos para escalonamento de instruções e alocação de registradores na infraestrutura LLVM

LC Silva - 2013 - repositorio.ufms.br
O objetivo deste trabalho _e apresentar uma proposta integrada para Escalonamento de
Instruções e Alocação de Registradores baseada em Isomorfismo de Subgrafos …

Optimizing GPU programs by register demotion: poster

P Sakdhnagool, A Sabne, R Eigenmann - Proceedings of the 24th …, 2019 - dl.acm.org
GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage
of on-chip resources. If the resource demand cannot be met, GPUs will reduce the number of …

Improving Productivity of Accelerator Computing Through Programming Models and Compiler Optimizations

P Sakdhnagool - 2017 - search.proquest.com
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis,
have seen an increasing popularity for their performance and have been employed by many …