Analytical characterization and design space exploration for optimization of CNNs
Moving data through the memory hierarchy is a fundamental bottleneck that can limit the
performance of core algorithms of machine learning, such as convolutional neural networks …
performance of core algorithms of machine learning, such as convolutional neural networks …
Simulated annealing with asymptotic convergence for nonlinear constrained optimization
In this paper, we present constrained simulated annealing (CSA), an algorithm that extends
conventional simulated annealing to look for constrained local minima of nonlinear …
conventional simulated annealing to look for constrained local minima of nonlinear …
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions
Q Lu, X Gao, S Krishnamoorthy, G Baumgartner… - Journal of Parallel and …, 2012 - Elsevier
Empirical optimizers like ATLAS have been very effective in optimizing computational
kernels in libraries. The best choice of parameters such as tile size and degree of loop …
kernels in libraries. The best choice of parameters such as tile size and degree of loop …
Automatic synthesis of out-of-core algorithms
We present a system for the automatic synthesis of efficient algorithms specialized for a
particular memory hierarchy and a set of storage devices. The developer provides two …
particular memory hierarchy and a set of storage devices. The developer provides two …
Efficient search‐space pruning for integrated fusion and tiling transformations
X Gao, S Krishnamoorthy, SK Sahoo… - Concurrency and …, 2007 - Wiley Online Library
Compile‐time optimizations involve a number of transformations such as loop permutation,
fusion, tiling, array contraction etc. The selection of the appropriate transformation to …
fusion, tiling, array contraction etc. The selection of the appropriate transformation to …
Theory and applications of simulated annealing for nonlinear constrained optimization
(1) where z=(x, y) T∈ Z; x∈ Rv and y∈ Dw are, respectively, bounded continuous and
discrete variables; f (z) is a lower-bounded objective function; g (z)=(g1 (z),…, gr (z)) T is a …
discrete variables; f (z) is a lower-bounded objective function; g (z)=(g1 (z),…, gr (z)) T is a …
Efficient search-space pruning for integrated fusion and tiling transformations
X Gao, S Krishnamoorthy, SK Sahoo, CC Lam… - … and Compilers for …, 2006 - Springer
Compile-time optimizations involve a number of transformations such as loop permutation,
fusion, tiling, array contraction, etc. Determination of the choice of these transformations that …
fusion, tiling, array contraction, etc. Determination of the choice of these transformations that …
Layout transformation support for the disk resident arrays framework
S Krishnamoorthy, G Baumgartner, CC Lam… - The Journal of …, 2006 - Springer
Abstract The Global Arrays (GA) toolkit provides a shared-memory programming model in
which data locality is explicitly managed by the programmer. It inter-operates with MPI and …
which data locality is explicitly managed by the programmer. It inter-operates with MPI and …
Implementing mathematical expressiveness in Diderot
C Chiw - 2017 - search.proquest.com
This dissertation describes the implementation of the mathematical expressiveness in the
Diderot programming language. Diderot is a domain-specific language for scientific …
Diderot programming language. Diderot is a domain-specific language for scientific …
Compiling Diderot: From tensor calculus to C
Diderot is a parallel domain-specific language for analysis and visualization of
multidimensional scientific images, such as those produced by CT and MRI scanners. In …
multidimensional scientific images, such as those produced by CT and MRI scanners. In …