Split-cnn: Splitting window-based operations in convolutional neural networks for memory system optimization

T Jin, S Hong - Proceedings of the Twenty-Fourth International …, 2019 - dl.acm.org
We present an interdisciplinary study to tackle the memory bottleneck of training deep
convolutional neural networks (CNN). Firstly, we introduce Split Convolutional Neural …

Automatic code generation and optimization of large-scale stencil computation on many-core processors

M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen… - Proceedings of the 50th …, 2021 - dl.acm.org
Stencil computation is an indispensable building block of many scientific applications and is
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …

[HTML][HTML] Преобразования программ-фундаментальная основа создания оптимизирующих распараллеливающих компиляторов

БЯ Штейнберг, ОБ Штейнберг - Программные системы: теория …, 2021 - cyberleninka.ru
В работе рассматриваются преобразования программ, приводящие к ускорению.
Приводятся публикации о различных параллельных вычислительных архитектурах и …

Tessellating stencils

L Yuan, Y Zhang, P Guo, S Huang - Proceedings of the International …, 2017 - dl.acm.org
Stencil computations represent a very common class of nested loops in scientific and
engineering applications. The exhaustively studied tiling is one of the most powerful …

Architectural adaptation and performance-energy optimization for CFD application on AMD EPYC Rome

L Szustak, R Wyrzykowski… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The advantages of the second-generation AMD EPYC Rome processors can be
successfully used in the race to Exascale. However, the novel architecture's complexity …

Tensor comprehensions in SaC

SB Scholz, A Šinkarovs - Proceedings of the 31st Symposium on …, 2019 - dl.acm.org
We propose a new notation for data parallel operators on multi-dimensional arrays named
tensor comprehensions. This notation combines the basic principle of array …

Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org
Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …

Adapting combined tiling to stencil optimizations on sunway processor

B Sun, M Li, H Yang, J Xu, Z Luan, D Qian - CCF Transactions on High …, 2023 - Springer
Stencil is one of the indispensable computation patterns in scientific applications, which is a
long-standing optimization target in the field of high performance computing (HPC). The …

[图书][B] Programming models for parallel computing

P Balaji - 2015 - books.google.com
An overview of the most prominent contemporary parallel processing programming models,
written in a unique tutorial style. With the coming of the parallel computing era, computer …

Autotuning stencil computations with structural ordinal regression learning

B Cosenza, JJ Durillo, S Ermon… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Stencil computations expose a large and complex space of equivalent implementations.
These computations often rely on autotuning techniques, based on iterative compilation or …