Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

J Ragan-Kelley, C Barnes, A Adams, S Paris… - Acm Sigplan …, 2013 - dl.acm.org
Image processing pipelines combine the challenges of stencil computations and stream
programs. They are composed of large graphs of different stencil stages, as well as complex …

Optoelectronic device simulations based on macroscopic Maxwell–Bloch equations

C Jirauschek, M Riesch… - Advanced Theory and …, 2019 - Wiley Online Library
Due to their intuitiveness, flexibility, and relative numerical efficiency, the macroscopic
Maxwell–Bloch (MB) equations are a widely used semiclassical and semi …

Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org
This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

The pochoir stencil compiler

Y Tang, RA Chowdhury, BC Kuszmaul, CK Luk… - Proceedings of the …, 2011 - dl.acm.org
A stencil computation repeatedly updates each point of ad-dimensional grid as a function of
itself and its near neighbors. Parallel cache-efficient stencil algorithms based on" trapezoidal …

High-performance code generation for stencil computations on GPU architectures

J Holewinski, LN Pouchet, P Sadayappan - Proceedings of the 26th ACM …, 2012 - dl.acm.org
Stencil computations arise in many scientific computing domains, and often represent time-
critical portions of applications. There is significant interest in offloading these computations …

A survey on parallel computing and its applications in data-parallel problems using GPU architectures

CA Navarro, N Hitschfeld-Kahler… - … in Computational Physics, 2014 - cambridge.org
Parallel computing has become an important subject in the field of computer science and
has proven to be critical when researching high performance solutions. The evolution of …

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

AKG: automatic kernel generation for neural processing units using polyhedral transformations

J Zhao, B Li, W Nie, Z Geng, R Zhang, X Gao… - Proceedings of the …, 2021 - dl.acm.org
Existing tensor compilers have proven their effectiveness in deploying deep neural networks
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …

Tiling stencil computations to maximize parallelism

V Bandishti, I Pananilath… - SC'12: Proceedings of …, 2012 - ieeexplore.ieee.org
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling hyperplanes such that all tiles along that face can be …