Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

BurstZ+: Eliminating the communication bottleneck of scientific computing accelerators via accelerated compression

G Sun, S Kang, SW Jun - ACM Transactions on Reconfigurable …, 2022 - dl.acm.org
We present BurstZ+, an accelerator platform that eliminates the communication bottleneck
between PCIe-attached scientific computing accelerators and their host servers, via …

Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation

M Steuwer, T Remmelg, C Dubach - Proceedings of the International …, 2016 - dl.acm.org
Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a
wide range of applications. They are found in most computing systems, and mobile devices …

RISE & shine: Language-oriented compiler design

M Steuwer, T Koehler, B Köpcke, F Pizzuti - arXiv preprint arXiv …, 2022 - arxiv.org
The trend towards specialization of software and hardware-fuelled by the end of Moore's law
and the still accelerating interest in domain-specific computing, such as machine learning …

Towards a domain-extensible compiler: optimizing an image processing pipeline on mobile cpus

T Koehler, M Steuwer - 2021 IEEE/ACM International …, 2021 - ieeexplore.ieee.org
Halide and many similar projects have demonstrated the great potential of domain specific
optimizing compilers. They enable programs to be expressed at a convenient high-level …

Full Version:(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms

A Rasch - arXiv preprint arXiv:2405.05118, 2024 - arxiv.org
We formally introduce a systematic (de/re)-composition approach, based on the algebraic
formalism of" Multi-Dimensional Homomorphisms (MDHs)". Our approach is designed as …

BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale data

G Sun, S Kang, SW Jun - Proceedings of the 34th ACM International …, 2020 - dl.acm.org
We present BurstZ, a bandwidth-efficient accelerator platform for scientific computing. While
accelerators such as GPUs and FPGAs provide enormous computing capabilities, their …

Compiler-assisted test acceleration on gpus for embedded software

V Yaneva, A Rajan, C Dubach - Proceedings of the 26th ACM SIGSOFT …, 2017 - dl.acm.org
Embedded software is found everywhere from our highly visible mobile devices to the
confines of our car in the form of smart sensors. Embedded software companies are under …

Optimization space pruning without regrets

U Beaugnon, A Pouille, M Pouzet, J Pienaar… - Proceedings of the 26th …, 2017 - dl.acm.org
Many computationally-intensive algorithms benefit from the wide parallelism offered by
Graphical Processing Units (GPUs). However, the search for a close-to-optimal …

Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift

L Stoltzfus, B Hagedorn, M Steuwer… - ACM Transactions on …, 2019 - dl.acm.org
Stencil computations are a widely used type of algorithm, found in applications from physical
simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on …