Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach

H Kwon, P Chatarasi, M Pellauer, A Parashar… - Proceedings of the …, 2019 - dl.acm.org
The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse
and perform staging are known as dataflow, which directly impacts the performance and …

Analytical characterization and design space exploration for optimization of CNNs

R Li, Y Xu, A Sukumaran-Rajam, A Rountev… - Proceedings of the 26th …, 2021 - dl.acm.org
Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …

Marvel: A data-centric approach for mapping deep learning operators on spatial accelerators

P Chatarasi, H Kwon, A Parashar, M Pellauer… - ACM Transactions on …, 2021 - dl.acm.org
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized mappings for various operators of DNN models. However, existing cost …

A multi-objective auto-tuning framework for parallel codes

H Jordan, P Thoman, JJ Durillo… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
In this paper we introduce a multi-objective autotuning framework comprising compiler and
runtime components. Focusing on individual code regions, our compiler uses a novel search …

Modesto: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures

T Gysi, T Grosser, T Hoefler - Proceedings of the 29th ACM on …, 2015 - dl.acm.org
Code transformations, such as loop tiling and loop fusion, are of key importance for the
efficient implementation of stencil computations. However, their direct application to a large …

Analytical modeling of cache behavior for affine programs

W Bao, S Krishnamoorthy, LN Pouchet… - Proceedings of the …, 2017 - dl.acm.org
Optimizing compilers implement program transformation strategies aimed at reducing data
movement to or from main memory by exploiting the data-cache hierarchy. However, instead …

Efficient tiled sparse matrix multiplication through matrix signatures

SE Kurt, A Sukumaran-Rajam… - … Conference for High …, 2020 - ieeexplore.ieee.org
Tiling is a key technique to reduce data movement in matrix computations. While tiling is well
understood and widely used for dense matrix/tensor computations, effective tiling of sparse …

Analytical cache modeling and tilesize optimization for tensor contractions

R Li, A Sukumaran-Rajam, R Veras, TM Low… - Proceedings of the …, 2019 - dl.acm.org
Data movement between processor and memory hierarchy is a fundamental bottleneck that
limits the performance of many applications on modern computer architectures. Tiling and …

Accurate high-level modeling and automated hardware/software co-design for effective SoC design space exploration

W Zuo, LN Pouchet, A Ayupov, T Kim, CW Lin… - Proceedings of the 54th …, 2017 - dl.acm.org
A desirable feature of a development tool for SoC design is that, given the important
applications in the domain to be targeted by the SoC, a powerful hardware-software …

Tile size selection revisited

S Mehta, G Beeraka, PC Yew - ACM Transactions on Architecture and …, 2013 - dl.acm.org
Loop tiling is a widely used loop transformation to enhance data locality and allow data
reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in …