Communication lower bounds and optimal algorithms for numerical linear algebra
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …
arithmetic operations it performs. Technological trends have long been reducing the time to …
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
Image processing pipelines combine the challenges of stencil computations and stream
programs. They are composed of large graphs of different stencil stages, as well as complex …
programs. They are composed of large graphs of different stencil stages, as well as complex …
[图书][B] Structured parallel programming: patterns for efficient computation
M McCool, J Reinders, A Robison - 2012 - books.google.com
Structured Parallel Programming offers the simplest way for developers to learn patterns for
high-performance parallel programming. Written by parallel computing experts and industry …
high-performance parallel programming. Written by parallel computing experts and industry …
[HTML][HTML] Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration
We introduce Devito, a new domain-specific language for implementing high-performance
finite-difference partial differential equation solvers. The motivating application is exploration …
finite-difference partial differential equation solvers. The motivating application is exploration …
High-performance code generation for stencil computations on GPU architectures
J Holewinski, LN Pouchet, P Sadayappan - Proceedings of the 26th ACM …, 2012 - dl.acm.org
Stencil computations arise in many scientific computing domains, and often represent time-
critical portions of applications. There is significant interest in offloading these computations …
critical portions of applications. There is significant interest in offloading these computations …
A quadtree-adaptive multigrid solver for the Serre–Green–Naghdi equations
S Popinet - Journal of Computational Physics, 2015 - Elsevier
Abstract The Serre–Green–Naghdi (SGN) equations, also known as the fully-nonlinear
Boussinesq wave equations, accurately describe the behaviour of dispersive shoaling water …
Boussinesq wave equations, accurately describe the behaviour of dispersive shoaling water …
NERO: A near high-bandwidth memory stencil accelerator for weather prediction modeling
G Singh, D Diamantopoulos… - … Conference on Field …, 2020 - ieeexplore.ieee.org
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
[PDF][PDF] Darkroom: compiling high-level image processing code into hardware pipelines.
Specialized image signal processors (ISPs) exploit the structure of image processing
pipelines to minimize memory bandwidth using the architectural pattern of line-buffering …
pipelines to minimize memory bandwidth using the architectural pattern of line-buffering …
Autotuning in high-performance computing applications
Autotuning refers to the automatic generation of a search space of possible implementations
of a computation that are evaluated through models and/or empirical measurement to …
of a computation that are evaluated through models and/or empirical measurement to …
SODA: Stencil with optimized dataflow architecture
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …
as image processing, solving partial differential equations, and cellular automata. Many of …