Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
BurstZ+: Eliminating the communication bottleneck of scientific computing accelerators via accelerated compression
We present BurstZ+, an accelerator platform that eliminates the communication bottleneck
between PCIe-attached scientific computing accelerators and their host servers, via …
between PCIe-attached scientific computing accelerators and their host servers, via …
Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation
Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a
wide range of applications. They are found in most computing systems, and mobile devices …
wide range of applications. They are found in most computing systems, and mobile devices …
RISE & shine: Language-oriented compiler design
The trend towards specialization of software and hardware-fuelled by the end of Moore's law
and the still accelerating interest in domain-specific computing, such as machine learning …
and the still accelerating interest in domain-specific computing, such as machine learning …
Towards a domain-extensible compiler: optimizing an image processing pipeline on mobile cpus
Halide and many similar projects have demonstrated the great potential of domain specific
optimizing compilers. They enable programs to be expressed at a convenient high-level …
optimizing compilers. They enable programs to be expressed at a convenient high-level …
Full Version:(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms
A Rasch - arXiv preprint arXiv:2405.05118, 2024 - arxiv.org
We formally introduce a systematic (de/re)-composition approach, based on the algebraic
formalism of" Multi-Dimensional Homomorphisms (MDHs)". Our approach is designed as …
formalism of" Multi-Dimensional Homomorphisms (MDHs)". Our approach is designed as …
BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale data
We present BurstZ, a bandwidth-efficient accelerator platform for scientific computing. While
accelerators such as GPUs and FPGAs provide enormous computing capabilities, their …
accelerators such as GPUs and FPGAs provide enormous computing capabilities, their …
Compiler-assisted test acceleration on gpus for embedded software
Embedded software is found everywhere from our highly visible mobile devices to the
confines of our car in the form of smart sensors. Embedded software companies are under …
confines of our car in the form of smart sensors. Embedded software companies are under …
Optimization space pruning without regrets
U Beaugnon, A Pouille, M Pouzet, J Pienaar… - Proceedings of the 26th …, 2017 - dl.acm.org
Many computationally-intensive algorithms benefit from the wide parallelism offered by
Graphical Processing Units (GPUs). However, the search for a close-to-optimal …
Graphical Processing Units (GPUs). However, the search for a close-to-optimal …
Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift
Stencil computations are a widely used type of algorithm, found in applications from physical
simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on …
simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on …