Analytical characterization and design space exploration for optimization of CNNs

R Li, Y Xu, A Sukumaran-Rajam, A Rountev… - Proceedings of the 26th …, 2021 - dl.acm.org
Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …

Simulated annealing with asymptotic convergence for nonlinear constrained optimization

BW Wah, Y Chen, T Wang - Journal of Global Optimization, 2007 - Springer
In this paper, we present constrained simulated annealing (CSA), an algorithm that extends
conventional simulated annealing to look for constrained local minima of nonlinear …

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Q Lu, X Gao, S Krishnamoorthy, G Baumgartner… - Journal of Parallel and …, 2012 - Elsevier
Empirical optimizers like ATLAS have been very effective in optimizing computational
kernels in libraries. The best choice of parameters such as tile size and degree of loop …

Automatic synthesis of out-of-core algorithms

Y Klonatos, A Nötzli, A Spielmann, C Koch… - Proceedings of the 2013 …, 2013 - dl.acm.org
We present a system for the automatic synthesis of efficient algorithms specialized for a
particular memory hierarchy and a set of storage devices. The developer provides two …

Efficient search‐space pruning for integrated fusion and tiling transformations

X Gao, S Krishnamoorthy, SK Sahoo… - Concurrency and …, 2007 - Wiley Online Library
Compile‐time optimizations involve a number of transformations such as loop permutation,
fusion, tiling, array contraction etc. The selection of the appropriate transformation to …

Theory and applications of simulated annealing for nonlinear constrained optimization

BW Wah, Y Chen, T Wang - Simulated Annealing, 2008 - books.google.com
(1) where z=(x, y) T∈ Z; x∈ Rv and y∈ Dw are, respectively, bounded continuous and
discrete variables; f (z) is a lower-bounded objective function; g (z)=(g1 (z),…, gr (z)) T is a …

Efficient search-space pruning for integrated fusion and tiling transformations

X Gao, S Krishnamoorthy, SK Sahoo, CC Lam… - … and Compilers for …, 2006 - Springer
Compile-time optimizations involve a number of transformations such as loop permutation,
fusion, tiling, array contraction, etc. Determination of the choice of these transformations that …

Layout transformation support for the disk resident arrays framework

S Krishnamoorthy, G Baumgartner, CC Lam… - The Journal of …, 2006 - Springer
Abstract The Global Arrays (GA) toolkit provides a shared-memory programming model in
which data locality is explicitly managed by the programmer. It inter-operates with MPI and …

Implementing mathematical expressiveness in Diderot

C Chiw - 2017 - search.proquest.com
This dissertation describes the implementation of the mathematical expressiveness in the
Diderot programming language. Diderot is a domain-specific language for scientific …

Compiling Diderot: From tensor calculus to C

C Chiw, GL Kindlmann, J Reppy - arXiv preprint arXiv:1802.06504, 2018 - arxiv.org
Diderot is a parallel domain-specific language for analysis and visualization of
multidimensional scientific images, such as those produced by CT and MRI scanners. In …