Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach
The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse
and perform staging are known as dataflow, which directly impacts the performance and …
and perform staging are known as dataflow, which directly impacts the performance and …
Analytical characterization and design space exploration for optimization of CNNs
Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …
performance of core algorithms of machine learning, such as convolutional neural networks …
Marvel: A data-centric approach for mapping deep learning operators on spatial accelerators
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized mappings for various operators of DNN models. However, existing cost …
generate optimized mappings for various operators of DNN models. However, existing cost …
A multi-objective auto-tuning framework for parallel codes
In this paper we introduce a multi-objective autotuning framework comprising compiler and
runtime components. Focusing on individual code regions, our compiler uses a novel search …
runtime components. Focusing on individual code regions, our compiler uses a novel search …
Modesto: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures
Code transformations, such as loop tiling and loop fusion, are of key importance for the
efficient implementation of stencil computations. However, their direct application to a large …
efficient implementation of stencil computations. However, their direct application to a large …
Analytical modeling of cache behavior for affine programs
Optimizing compilers implement program transformation strategies aimed at reducing data
movement to or from main memory by exploiting the data-cache hierarchy. However, instead …
movement to or from main memory by exploiting the data-cache hierarchy. However, instead …
Efficient tiled sparse matrix multiplication through matrix signatures
SE Kurt, A Sukumaran-Rajam… - … Conference for High …, 2020 - ieeexplore.ieee.org
Tiling is a key technique to reduce data movement in matrix computations. While tiling is well
understood and widely used for dense matrix/tensor computations, effective tiling of sparse …
understood and widely used for dense matrix/tensor computations, effective tiling of sparse …
Analytical cache modeling and tilesize optimization for tensor contractions
Data movement between processor and memory hierarchy is a fundamental bottleneck that
limits the performance of many applications on modern computer architectures. Tiling and …
limits the performance of many applications on modern computer architectures. Tiling and …
Accurate high-level modeling and automated hardware/software co-design for effective SoC design space exploration
W Zuo, LN Pouchet, A Ayupov, T Kim, CW Lin… - Proceedings of the 54th …, 2017 - dl.acm.org
A desirable feature of a development tool for SoC design is that, given the important
applications in the domain to be targeted by the SoC, a powerful hardware-software …
applications in the domain to be targeted by the SoC, a powerful hardware-software …
Tile size selection revisited
Loop tiling is a widely used loop transformation to enhance data locality and allow data
reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in …
reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in …