Moses: Efficient exploitation of cross-device transferable features for tensor program optimization

Y Zhai, Y Zhang, S Liu, X Chu, J Peng, J Ji… - Proceedings of the 28th …, 2023 - dl.acm.org

Tensor program tuning is a non-convex objective optimization problem, to which search-
based approaches have proven to be effective. At the core of the search-based approaches …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

H Hu, J Su, J Zhao, Y Peng, Y Zhu, H Lin… - Proceedings of the …, 2024 - dl.acm.org

Deep Neural Networks (DNNs) have shown excellent performance in a wide range of
machine learning applications. Knowing the latency of running a DNN model or tensor …

被引用次数：1 相关文章所有 4 个版本

[PDF] mdpi.com

Cross-Feature Transfer Learning for Efficient Tensor Program Generation

G Verma, S Raskar, M Emani, B Chapman - Applied Sciences, 2024 - mdpi.com

Tuning tensor program generation involves navigating a vast search space to find optimal
program transformations and measurements for a program on the target hardware. The …

被引用次数：3 相关文章所有 5 个版本

Haotuner: A hardware adaptive operator auto-tuner for dynamic shape tensor compilers

P Mu, Y Liu, R Wang, G Liu, Z Sun… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Deep learning compilers with auto-tuners have the ability to generate high-performance
programs, particularly tensor programs on accelerators. However, the performance of these …

被引用次数：4 相关文章所有 3 个版本

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

M Li, H Yang, S Zhang, F Yu, R Gong, Y Liu… - Proceedings of the …, 2023 - dl.acm.org

The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …

被引用次数：1 相关文章

[PDF] github.io

Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU

Z Zhao, N Ling, N Guan, G Xing - … of the 20th ACM Conference on …, 2022 - dl.acm.org

AI applications powered by deep learning are increasingly running on edge devices.
Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org