Effective automatic parallelization of stencil computations

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

被引用次数：391 相关文章所有 14 个版本

[PDF] mit.edu

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

J Ragan-Kelley, C Barnes, A Adams, S Paris… - Acm Sigplan …, 2013 - dl.acm.org

Image processing pipelines combine the challenges of stencil computations and stream
programs. They are composed of large graphs of different stencil stages, as well as complex …

被引用次数：1641 相关文章所有 12 个版本

[PDF] arxiv.org

Optoelectronic device simulations based on macroscopic Maxwell–Bloch equations

C Jirauschek, M Riesch… - Advanced Theory and …, 2019 - Wiley Online Library

Due to their intuitiveness, flexibility, and relative numerical efficiency, the macroscopic
Maxwell–Bloch (MB) equations are a widely used semiclassical and semi …

被引用次数：51 相关文章所有 4 个版本

[PDF] iisc.ac.in

Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org

This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

被引用次数：282 相关文章所有 9 个版本

[PDF] mit.edu

The pochoir stencil compiler

Y Tang, RA Chowdhury, BC Kuszmaul, CK Luk… - Proceedings of the …, 2011 - dl.acm.org

A stencil computation repeatedly updates each point of ad-dimensional grid as a function of
itself and its near neighbors. Parallel cache-efficient stencil algorithms based on" trapezoidal …

被引用次数：458 相关文章所有 17 个版本

[PDF] colostate.edu

High-performance code generation for stencil computations on GPU architectures

J Holewinski, LN Pouchet, P Sadayappan - Proceedings of the 26th ACM …, 2012 - dl.acm.org

Stencil computations arise in many scientific computing domains, and often represent time-
critical portions of applications. There is significant interest in offloading these computations …

被引用次数：374 相关文章所有 10 个版本

[PDF] psu.edu

A survey on parallel computing and its applications in data-parallel problems using GPU architectures

CA Navarro, N Hitschfeld-Kahler… - … in Computational Physics, 2014 - cambridge.org

Parallel computing has become an important subject in the field of computer science and
has proven to be critical when researching high performance solutions. The evolution of …

被引用次数：286 相关文章所有 3 个版本

[PDF] nsf.gov

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org

Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

被引用次数：132 相关文章所有 9 个版本

[PDF] ring0.me

AKG: automatic kernel generation for neural processing units using polyhedral transformations

J Zhao, B Li, W Nie, Z Geng, R Zhang, X Gao… - Proceedings of the …, 2021 - dl.acm.org

Existing tensor compilers have proven their effectiveness in deploying deep neural networks
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …

被引用次数：74 相关文章所有 8 个版本

[PDF] archive.org

Tiling stencil computations to maximize parallelism

V Bandishti, I Pananilath… - SC'12: Proceedings of …, 2012 - ieeexplore.ieee.org

Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling hyperplanes such that all tiles along that face can be …

被引用次数：250 相关文章所有 11 个版本