Analytical bounds for optimal tile size selection

H Kwon, P Chatarasi, M Pellauer, A Parashar… - Proceedings of the …, 2019 - dl.acm.org

The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse
and perform staging are known as dataflow, which directly impacts the performance and …

被引用次数：321 相关文章所有 10 个版本

[PDF] acm.org

Analytical characterization and design space exploration for optimization of CNNs

R Li, Y Xu, A Sukumaran-Rajam, A Rountev… - Proceedings of the 26th …, 2021 - dl.acm.org

Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …

被引用次数：63 相关文章所有 5 个版本

[PDF] acm.org Full View

Marvel: A data-centric approach for mapping deep learning operators on spatial accelerators

P Chatarasi, H Kwon, A Parashar, M Pellauer… - ACM Transactions on …, 2021 - dl.acm.org

A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized mappings for various operators of DNN models. However, existing cost …

被引用次数：53 相关文章所有 7 个版本

[PDF] researchgate.net

A multi-objective auto-tuning framework for parallel codes

H Jordan, P Thoman, JJ Durillo… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org

In this paper we introduce a multi-objective autotuning framework comprising compiler and
runtime components. Focusing on individual code regions, our compiler uses a novel search …

被引用次数：128 相关文章所有 8 个版本

[PDF] ethz.ch

Modesto: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures

T Gysi, T Grosser, T Hoefler - Proceedings of the 29th ACM on …, 2015 - dl.acm.org

Code transformations, such as loop tiling and loop fusion, are of key importance for the
efficient implementation of stencil computations. However, their direct application to a large …

被引用次数：79 相关文章所有 21 个版本

[PDF] acm.org

Analytical modeling of cache behavior for affine programs

W Bao, S Krishnamoorthy, LN Pouchet… - Proceedings of the …, 2017 - dl.acm.org

Optimizing compilers implement program transformation strategies aimed at reducing data
movement to or from main memory by exploiting the data-cache hierarchy. However, instead …

被引用次数：54 相关文章所有 3 个版本

[PDF] hal.science

Efficient tiled sparse matrix multiplication through matrix signatures

SE Kurt, A Sukumaran-Rajam… - … Conference for High …, 2020 - ieeexplore.ieee.org

Tiling is a key technique to reduce data movement in matrix computations. While tiling is well
understood and widely used for dense matrix/tensor computations, effective tiling of sparse …

被引用次数：32 相关文章所有 12 个版本

[PDF] nsf.gov

Analytical cache modeling and tilesize optimization for tensor contractions

R Li, A Sukumaran-Rajam, R Veras, TM Low… - Proceedings of the …, 2019 - dl.acm.org

Data movement between processor and memory hierarchy is a fundamental bottleneck that
limits the performance of many applications on modern computer architectures. Tiling and …

被引用次数：35 相关文章所有 5 个版本

Accurate high-level modeling and automated hardware/software co-design for effective SoC design space exploration

W Zuo, LN Pouchet, A Ayupov, T Kim, CW Lin… - Proceedings of the 54th …, 2017 - dl.acm.org

A desirable feature of a development tool for SoC design is that, given the important
applications in the domain to be targeted by the SoC, a powerful hardware-software …

被引用次数：40 相关文章所有 3 个版本

[PDF] acm.org

Tile size selection revisited

S Mehta, G Beeraka, PC Yew - ACM Transactions on Architecture and …, 2013 - dl.acm.org

Loop tiling is a widely used loop transformation to enhance data locality and allow data
reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in …

被引用次数：50 相关文章所有 7 个版本