Spill code placement for simd machines

D Sampaio, RM Souza, C Collange… - ACM Transactions on …, 2014 - dl.acm.org

Growing interest in graphics processing units has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …

被引用次数：36 相关文章所有 12 个版本

[PDF] rutgers.edu

Unified on-chip memory allocation for SIMT architecture

AB Hayes, EZ Zhang - Proceedings of the 28th ACM international …, 2014 - dl.acm.org

The popularity of general purpose Graphic Processing Unit (GPU) is largely attributed to the
tremendous concurrency enabled by its underlying architecture--single instruction multiple …

被引用次数：37 相关文章所有 4 个版本

Pointer-based divergence analysis for OpenCL 2.0 programs

SC Wang, LY Yu, LA Her, YS Hwang… - ACM Transactions on …, 2021 - dl.acm.org

A modern GPU is designed with many large thread groups to achieve a high throughput and
performance. Within these groups, the threads are grouped into fixed-size SIMD batches in …

被引用次数：9 相关文章

[PDF] acm.org

Orion: A framework for gpu occupancy tuning

AB Hayes, L Li, D Chavarría-Miranda, SL Song… - Proceedings of the 17th …, 2016 - dl.acm.org

An important feature of modern GPU architectures is variable occupancy. Occupancy
measures the ratio between the actual number of threads actively running on a GPU and the …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

RegDem: Increasing GPU performance via shared memory register spilling

P Sakdhnagool, A Sabne, R Eigenmann - arXiv preprint arXiv:1907.02894, 2019 - arxiv.org

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage
of on-chip resources, such as registers and the programmer-managed shared memory …

被引用次数：9 相关文章所有 2 个版本

[PDF] rutgers.edu

A GPU binary analysis framework for memory performance and safety

AB Hayes - 2022 - search.proquest.com

Abstract General-Purpose Graphics Processing Units (GPUs) have attained popularity for
their massive concurrency. But with their relative infancy, as well as a prevalence of closed …

[PDF][PDF] GPU Divergence: Analysis and Register Allocation

DN Sampaio - 2013 - inria.hal.science

The use of graphics processing units (GPUs) for accelerating Data Parallel workloads is the
new trend on the computing market. This growing interest brought renewed attention to the …

被引用次数：1 相关文章

[PDF] ufms.br

Algoritmos para escalonamento de instruções e alocação de registradores na infraestrutura LLVM

LC Silva - 2013 - repositorio.ufms.br

O objetivo deste trabalho _e apresentar uma proposta integrada para Escalonamento de
Instruções e Alocação de Registradores baseada em Isomorfismo de Subgrafos …

被引用次数：1 相关文章所有 4 个版本

Optimizing GPU programs by register demotion: poster

P Sakdhnagool, A Sabne, R Eigenmann - Proceedings of the 24th …, 2019 - dl.acm.org

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage
of on-chip resources. If the resource demand cannot be met, GPUs will reduce the number of …

Improving Productivity of Accelerator Computing Through Programming Models and Compiler Optimizations

P Sakdhnagool - 2017 - search.proquest.com

During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis,
have seen an increasing popularity for their performance and have been employed by many …