AKG: automatic kernel generation for neural processing units using polyhedral transformations
Existing tensor compilers have proven their effectiveness in deploying deep neural networks
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …
PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler
G Consolaro, Z Zhang, H Razanajato… - 2024 IEEE/ACM …, 2024 - ieeexplore.ieee.org
Polyhedral techniques have been widely used for automatic code optimization in low-level
compilers and higher-level processes. Loop optimization is central to this technique, and …
compilers and higher-level processes. Loop optimization is central to this technique, and …
Report of the workshop on program synthesis for scientific computing
Program synthesis is an active research field in academia, national labs, and industry. Yet,
work directly applicable to scientific computing, while having some impressive successes …
work directly applicable to scientific computing, while having some impressive successes …
Source matching and rewriting for MLIR using string-based automata
A typical compiler flow relies on a uni-directional sequence of translation/optimization steps
that lower the program abstract representation, making it hard to preserve higher-level …
that lower the program abstract representation, making it hard to preserve higher-level …
Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation
K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org
Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …
intensive programs. It has been successfully applied to a number of applications including …
Towards intelligent compiler optimization
The future of computation is massively parallel and heterogeneous with specialized
accelerator devices and instruction sets in both edge-and cluster-computing. However …
accelerator devices and instruction sets in both edge-and cluster-computing. However …
On the impact of affine loop transformations in qubit allocation
M Kong - ACM Transactions on Quantum Computing, 2021 - dl.acm.org
Most quantum compiler transformations and qubit allocation techniques to date are either
peep-hole focused or rely on sliding windows that depend on a number of external …
peep-hole focused or rely on sliding windows that depend on a number of external …
[HTML][HTML] Abstractions for C++ code optimizations in parallel high-performance applications
Many computational problems consider memory throughput a performance bottleneck,
especially in the domain of parallel computing. Software needs to be attuned to hardware …
especially in the domain of parallel computing. Software needs to be attuned to hardware …
Collage: Seamless integration of deep learning backends with automatic placement
The strong demand for efficient and performant deployment of Deep Learning (DL)
applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast …
applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast …
Efficiently Learning Locality Optimizations by Decomposing Transformation Domains
TR Patabandi, M Hall - Proceedings of the 32nd ACM SIGPLAN …, 2023 - dl.acm.org
Optimizing compilers for efficient machine learning are more important than ever due to the
rising ubiquity of the application domain in numerous facets of life. Predictive model-guided …
rising ubiquity of the application domain in numerous facets of life. Predictive model-guided …