[HTML][HTML] Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration
We introduce Devito, a new domain-specific language for implementing high-performance
finite-difference partial differential equation solvers. The motivating application is exploration …
finite-difference partial differential equation solvers. The motivating application is exploration …
Polymage: Automatic optimization for image processing pipelines
RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org
This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …
language and compiler for image processing pipelines. An image processing pipeline can …
MIMD Programs Execution Support on SIMD Machines: A Holistic Survey
D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
SPIRAL: Extreme performance portability
In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …
highly efficient code for a wide range of computing platforms and establish the correctness of …
High performance stencil code generation with lift
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …
STELLA: A domain-specific tool for structured grid methods in weather and climate models
Many high-performance computing applications solving partial differential equations (PDEs)
can be attributed to the class of kernels using stencils on structured grids. Due to the …
can be attributed to the class of kernels using stencils on structured grids. Due to the …
Alphasparse: Generating high performance spmv codes directly from sparse matrices
Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …
application scenarios. Tens of sparse matrix formats and implementations have been …
Hybrid hexagonal/classical tiling for GPUs
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …
hyper-rectangular tiles cannot be used due to the combination of backward and forward …
Optimising purely functional GPU programs
Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …
GPUs. However, the naive compilation of such programs quickly leads to both code …
lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods
M Bauer, H Köstler, U Rüde - Journal of Computational Science, 2021 - Elsevier
Lattice Boltzmann methods are a popular mesoscopic alternative to classical computational
fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants …
fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants …