A stencil compiler for short-vector simd architectures

[HTML][HTML] Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration

M Louboutin, M Lange, F Luporini… - Geoscientific Model …, 2019 - gmd.copernicus.org

We introduce Devito, a new domain-specific language for implementing high-performance
finite-difference partial differential equation solvers. The motivating application is exploration …

被引用次数：219 相关文章所有 10 个版本

[PDF] iisc.ac.in

Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org

This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

被引用次数：282 相关文章所有 9 个版本

[PDF] ieee.org

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org

The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

被引用次数：2 相关文章所有 2 个版本

[PDF] ieee.org

SPIRAL: Extreme performance portability

F Franchetti, TM Low, DT Popovici… - Proceedings of the …, 2018 - ieeexplore.ieee.org

In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …

被引用次数：133 相关文章所有 17 个版本

[PDF] gla.ac.uk

High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018 - dl.acm.org

Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

被引用次数：134 相关文章所有 17 个版本

STELLA: A domain-specific tool for structured grid methods in weather and climate models

T Gysi, C Osuna, O Fuhrer, M Bianco… - Proceedings of the …, 2015 - dl.acm.org

Many high-performance computing applications solving partial differential equations (PDEs)
can be attributed to the class of kernels using stencils on structured grids. Due to the …

被引用次数：136 相关文章所有 4 个版本

[PDF] arxiv.org

Alphasparse: Generating high performance spmv codes directly from sparse matrices

Z Du, J Li, Y Wang, X Li, G Tan… - … Conference for High …, 2022 - ieeexplore.ieee.org

Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …

被引用次数：26 相关文章所有 7 个版本

[PDF] academia.edu

Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014 - dl.acm.org

Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …

被引用次数：157 相关文章所有 13 个版本

[PDF] unsw.edu.au

Optimising purely functional GPU programs

TL McDonell, MMT Chakravarty, G Keller… - ACM SIGPLAN …, 2013 - dl.acm.org

Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …

被引用次数：147 相关文章所有 11 个版本

[PDF] arxiv.org

lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods

M Bauer, H Köstler, U Rüde - Journal of Computational Science, 2021 - Elsevier

Lattice Boltzmann methods are a popular mesoscopic alternative to classical computational
fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants …

被引用次数：59 相关文章所有 3 个版本