Split-cnn: Splitting window-based operations in convolutional neural networks for memory system optimization
We present an interdisciplinary study to tackle the memory bottleneck of training deep
convolutional neural networks (CNN). Firstly, we introduce Split Convolutional Neural …
convolutional neural networks (CNN). Firstly, we introduce Split Convolutional Neural …
Automatic code generation and optimization of large-scale stencil computation on many-core processors
Stencil computation is an indispensable building block of many scientific applications and is
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …
[HTML][HTML] Преобразования программ-фундаментальная основа создания оптимизирующих распараллеливающих компиляторов
БЯ Штейнберг, ОБ Штейнберг - Программные системы: теория …, 2021 - cyberleninka.ru
В работе рассматриваются преобразования программ, приводящие к ускорению.
Приводятся публикации о различных параллельных вычислительных архитектурах и …
Приводятся публикации о различных параллельных вычислительных архитектурах и …
Tessellating stencils
L Yuan, Y Zhang, P Guo, S Huang - Proceedings of the International …, 2017 - dl.acm.org
Stencil computations represent a very common class of nested loops in scientific and
engineering applications. The exhaustively studied tiling is one of the most powerful …
engineering applications. The exhaustively studied tiling is one of the most powerful …
Architectural adaptation and performance-energy optimization for CFD application on AMD EPYC Rome
L Szustak, R Wyrzykowski… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The advantages of the second-generation AMD EPYC Rome processors can be
successfully used in the race to Exascale. However, the novel architecture's complexity …
successfully used in the race to Exascale. However, the novel architecture's complexity …
Tensor comprehensions in SaC
SB Scholz, A Šinkarovs - Proceedings of the 31st Symposium on …, 2019 - dl.acm.org
We propose a new notation for data parallel operators on multi-dimensional arrays named
tensor comprehensions. This notation combines the basic principle of array …
tensor comprehensions. This notation combines the basic principle of array …
Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation
K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org
Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …
intensive programs. It has been successfully applied to a number of applications including …
Adapting combined tiling to stencil optimizations on sunway processor
Stencil is one of the indispensable computation patterns in scientific applications, which is a
long-standing optimization target in the field of high performance computing (HPC). The …
long-standing optimization target in the field of high performance computing (HPC). The …
[图书][B] Programming models for parallel computing
P Balaji - 2015 - books.google.com
An overview of the most prominent contemporary parallel processing programming models,
written in a unique tutorial style. With the coming of the parallel computing era, computer …
written in a unique tutorial style. With the coming of the parallel computing era, computer …
Autotuning stencil computations with structural ordinal regression learning
Stencil computations expose a large and complex space of equivalent implementations.
These computations often rely on autotuning techniques, based on iterative compilation or …
These computations often rely on autotuning techniques, based on iterative compilation or …