Tlp: A deep learning-based cost model for tensor program tuning

Y Zhai, Y Zhang, S Liu, X Chu, J Peng, J Ji… - Proceedings of the 28th …, 2023 - dl.acm.org
Tensor program tuning is a non-convex objective optimization problem, to which search-
based approaches have proven to be effective. At the core of the search-based approaches …

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

H Hu, J Su, J Zhao, Y Peng, Y Zhu, H Lin… - Proceedings of the …, 2024 - dl.acm.org
Deep Neural Networks (DNNs) have shown excellent performance in a wide range of
machine learning applications. Knowing the latency of running a DNN model or tensor …

Cross-Feature Transfer Learning for Efficient Tensor Program Generation

G Verma, S Raskar, M Emani, B Chapman - Applied Sciences, 2024 - mdpi.com
Tuning tensor program generation involves navigating a vast search space to find optimal
program transformations and measurements for a program on the target hardware. The …

Haotuner: A hardware adaptive operator auto-tuner for dynamic shape tensor compilers

P Mu, Y Liu, R Wang, G Liu, Z Sun… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Deep learning compilers with auto-tuners have the ability to generate high-performance
programs, particularly tensor programs on accelerators. However, the performance of these …

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

M Li, H Yang, S Zhang, F Yu, R Gong, Y Liu… - Proceedings of the …, 2023 - dl.acm.org
The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …

Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU

Z Zhao, N Ling, N Guan, G Xing - … of the 20th ACM Conference on …, 2022 - dl.acm.org
AI applications powered by deep learning are increasingly running on edge devices.
Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the …

FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers

P Mu, L Wei, Y Liu, R Wang - arXiv preprint arXiv:2407.21418, 2024 - arxiv.org
Many artificial intelligence models process input data of different lengths and resolutions,
making the shape of the tensors dynamic. The performance of these models depends on the …

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

L Qiao, J Shi, X Hao, X Fang, M Zhao, Z Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Tensor program optimization on Deep Learning Accelerators (DLAs) is critical for efficient
model deployment. Although search-based Deep Learning Compilers (DLCs) have …

[PDF][PDF] Optimizing On-Device Model Adaptation with Minimal Latency for Real-Time Personalized Applications

K Mehta, M Gupta, A Nair, S Desai, P Singh, T Peng… - 2024 - researchgate.net
Real-time personalized applications require optimized on-device model adaptation to
deliver immediate responses. This paper introduces a hybrid architecture that effectively …