AKG: automatic kernel generation for neural processing units using polyhedral transformations

J Zhao, B Li, W Nie, Z Geng, R Zhang, X Gao… - Proceedings of the …, 2021 - dl.acm.org
Existing tensor compilers have proven their effectiveness in deploying deep neural networks
on general-purpose hardware like CPU and GPU, but optimizing for neural processing units …

PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler

G Consolaro, Z Zhang, H Razanajato… - 2024 IEEE/ACM …, 2024 - ieeexplore.ieee.org
Polyhedral techniques have been widely used for automatic code optimization in low-level
compilers and higher-level processes. Loop optimization is central to this technique, and …

Report of the workshop on program synthesis for scientific computing

H Finkel, I Laguna - arXiv preprint arXiv:2102.01687, 2021 - arxiv.org
Program synthesis is an active research field in academia, national labs, and industry. Yet,
work directly applicable to scientific computing, while having some impressive successes …

Source matching and rewriting for MLIR using string-based automata

V Espindola, L Zago, H Yviquel, G Araujo - ACM Transactions on …, 2023 - dl.acm.org
A typical compiler flow relies on a uni-directional sequence of translation/optimization steps
that lower the program abstract representation, making it hard to preserve higher-level …

Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org
Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …

Towards intelligent compiler optimization

M Kovac, M Brcic, A Krajna… - 2022 45th Jubilee …, 2022 - ieeexplore.ieee.org
The future of computation is massively parallel and heterogeneous with specialized
accelerator devices and instruction sets in both edge-and cluster-computing. However …

On the impact of affine loop transformations in qubit allocation

M Kong - ACM Transactions on Quantum Computing, 2021 - dl.acm.org
Most quantum compiler transformations and qubit allocation techniques to date are either
peep-hole focused or rely on sliding windows that depend on a number of external …

[HTML][HTML] Abstractions for C++ code optimizations in parallel high-performance applications

J Klepl, A Šmelko, L Rozsypal, M Kruliš - Parallel Computing, 2024 - Elsevier
Many computational problems consider memory throughput a performance bottleneck,
especially in the domain of parallel computing. Software needs to be attuned to hardware …

Collage: Seamless integration of deep learning backends with automatic placement

B Jeon, S Park, P Liao, S Xu, T Chen, Z Jia - Proceedings of the …, 2022 - dl.acm.org
The strong demand for efficient and performant deployment of Deep Learning (DL)
applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast …

Efficiently Learning Locality Optimizations by Decomposing Transformation Domains

TR Patabandi, M Hall - Proceedings of the 32nd ACM SIGPLAN …, 2023 - dl.acm.org
Optimizing compilers for efficient machine learning are more important than ever due to the
rising ubiquity of the application domain in numerous facets of life. Predictive model-guided …