[HTML][HTML] Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration

M Louboutin, M Lange, F Luporini… - Geoscientific Model …, 2019 - gmd.copernicus.org
We introduce Devito, a new domain-specific language for implementing high-performance
finite-difference partial differential equation solvers. The motivating application is exploration …

Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org
This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

SPIRAL: Extreme performance portability

F Franchetti, TM Low, DT Popovici… - Proceedings of the …, 2018 - ieeexplore.ieee.org
In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …

High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018 - dl.acm.org
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

STELLA: A domain-specific tool for structured grid methods in weather and climate models

T Gysi, C Osuna, O Fuhrer, M Bianco… - Proceedings of the …, 2015 - dl.acm.org
Many high-performance computing applications solving partial differential equations (PDEs)
can be attributed to the class of kernels using stencils on structured grids. Due to the …

Alphasparse: Generating high performance spmv codes directly from sparse matrices

Z Du, J Li, Y Wang, X Li, G Tan… - … Conference for High …, 2022 - ieeexplore.ieee.org
Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …

Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014 - dl.acm.org
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …

Optimising purely functional GPU programs

TL McDonell, MMT Chakravarty, G Keller… - ACM SIGPLAN …, 2013 - dl.acm.org
Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …

lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods

M Bauer, H Köstler, U Rüde - Journal of Computational Science, 2021 - Elsevier
Lattice Boltzmann methods are a popular mesoscopic alternative to classical computational
fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants …