Communication lower bounds and optimal algorithms for numerical linear algebra

G Ballard, E Carson, J Demmel, M Hoemmen… - Acta Numerica, 2014 - cambridge.org
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

J Ragan-Kelley, C Barnes, A Adams, S Paris… - Acm Sigplan …, 2013 - dl.acm.org
Image processing pipelines combine the challenges of stencil computations and stream
programs. They are composed of large graphs of different stencil stages, as well as complex …

[图书][B] Structured parallel programming: patterns for efficient computation

M McCool, J Reinders, A Robison - 2012 - books.google.com
Structured Parallel Programming offers the simplest way for developers to learn patterns for
high-performance parallel programming. Written by parallel computing experts and industry …

[HTML][HTML] Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration

M Louboutin, M Lange, F Luporini… - Geoscientific Model …, 2019 - gmd.copernicus.org
We introduce Devito, a new domain-specific language for implementing high-performance
finite-difference partial differential equation solvers. The motivating application is exploration …

High-performance code generation for stencil computations on GPU architectures

J Holewinski, LN Pouchet, P Sadayappan - Proceedings of the 26th ACM …, 2012 - dl.acm.org
Stencil computations arise in many scientific computing domains, and often represent time-
critical portions of applications. There is significant interest in offloading these computations …

A quadtree-adaptive multigrid solver for the Serre–Green–Naghdi equations

S Popinet - Journal of Computational Physics, 2015 - Elsevier
Abstract The Serre–Green–Naghdi (SGN) equations, also known as the fully-nonlinear
Boussinesq wave equations, accurately describe the behaviour of dispersive shoaling water …

NERO: A near high-bandwidth memory stencil accelerator for weather prediction modeling

G Singh, D Diamantopoulos… - … Conference on Field …, 2020 - ieeexplore.ieee.org
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …

[PDF][PDF] Darkroom: compiling high-level image processing code into hardware pipelines.

J Hegarty, JS Brunhaver, Z DeVito, J Ragan-Kelley… - ACM Trans. Graph., 2014 - Citeseer
Specialized image signal processors (ISPs) exploit the structure of image processing
pipelines to minimize memory bandwidth using the architectural pattern of line-buffering …

Autotuning in high-performance computing applications

P Balaprakash, J Dongarra, T Gamblin… - Proceedings of the …, 2018 - ieeexplore.ieee.org
Autotuning refers to the automatic generation of a search space of possible implementations
of a computation that are evaluated through models and/or empirical measurement to …

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …