Parameterized diamond tiling for stencil computations with chapel parallel iterators

Split-cnn: Splitting window-based operations in convolutional neural networks for memory system optimization

T Jin, S Hong - Proceedings of the Twenty-Fourth International …, 2019 - dl.acm.org

We present an interdisciplinary study to tackle the memory bottleneck of training deep
convolutional neural networks (CNN). Firstly, we introduce Split Convolutional Neural …

被引用次数：55 相关文章所有 2 个版本

[PDF] ssslab.cn

Automatic code generation and optimization of large-scale stencil computation on many-core processors

M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen… - Proceedings of the 50th …, 2021 - dl.acm.org

Stencil computation is an indispensable building block of many scientific applications and is
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …

被引用次数：19 相关文章所有 2 个版本

[HTML] cyberleninka.ru Full View

[HTML][HTML] Преобразования программ-фундаментальная основа создания оптимизирующих распараллеливающих компиляторов

БЯ Штейнберг, ОБ Штейнберг - Программные системы: теория …, 2021 - cyberleninka.ru

В работе рассматриваются преобразования программ, приводящие к ускорению.
Приводятся публикации о различных параллельных вычислительных архитектурах и …

被引用次数：12 相关文章所有 5 个版本

Tessellating stencils

L Yuan, Y Zhang, P Guo, S Huang - Proceedings of the International …, 2017 - dl.acm.org

Stencil computations represent a very common class of nested loops in scientific and
engineering applications. The exhaustively studied tiling is one of the most powerful …

被引用次数：23 相关文章所有 5 个版本

[PDF] ieee.org

Architectural adaptation and performance-energy optimization for CFD application on AMD EPYC Rome

L Szustak, R Wyrzykowski… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

The advantages of the second-generation AMD EPYC Rome processors can be
successfully used in the race to Exascale. However, the novel architecture's complexity …

被引用次数：9 相关文章所有 2 个版本

[PDF] acm.org

Tensor comprehensions in SaC

SB Scholz, A Šinkarovs - Proceedings of the 31st Symposium on …, 2019 - dl.acm.org

We propose a new notation for data parallel operators on multi-dimensional arrays named
tensor comprehensions. This notation combines the basic principle of array …

被引用次数：16 相关文章所有 5 个版本

[PDF] acm.org

Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org

Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …

被引用次数：10 相关文章

Adapting combined tiling to stencil optimizations on sunway processor

B Sun, M Li, H Yang, J Xu, Z Luan, D Qian - CCF Transactions on High …, 2023 - Springer

Stencil is one of the indispensable computation patterns in scientific applications, which is a
long-standing optimization target in the field of high performance computing (HPC). The …

被引用次数：4 相关文章

[图书][B] Programming models for parallel computing

P Balaji - 2015 - books.google.com

An overview of the most prominent contemporary parallel processing programming models,
written in a unique tutorial style. With the coming of the parallel computing era, computer …

被引用次数：23 相关文章所有 8 个版本

[PDF] tu-berlin.de

Autotuning stencil computations with structural ordinal regression learning

B Cosenza, JJ Durillo, S Ermon… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Stencil computations expose a large and complex space of equivalent implementations.
These computations often rely on autotuning techniques, based on iterative compilation or …

被引用次数：18 相关文章所有 14 个版本