Tlp: A deep learning-based cost model for tensor program tuning
Tensor program tuning is a non-convex objective optimization problem, to which search-
based approaches have proven to be effective. At the core of the search-based approaches …
based approaches have proven to be effective. At the core of the search-based approaches …
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Deep Neural Networks (DNNs) have shown excellent performance in a wide range of
machine learning applications. Knowing the latency of running a DNN model or tensor …
machine learning applications. Knowing the latency of running a DNN model or tensor …
Cross-Feature Transfer Learning for Efficient Tensor Program Generation
Tuning tensor program generation involves navigating a vast search space to find optimal
program transformations and measurements for a program on the target hardware. The …
program transformations and measurements for a program on the target hardware. The …
Haotuner: A hardware adaptive operator auto-tuner for dynamic shape tensor compilers
P Mu, Y Liu, R Wang, G Liu, Z Sun… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Deep learning compilers with auto-tuners have the ability to generate high-performance
programs, particularly tensor programs on accelerators. However, the performance of these …
programs, particularly tensor programs on accelerators. However, the performance of these …
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs
The requirement for deploying deep learning (DL) models efficiently has boosted the
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …
research of DL compilers. Especially, the difficulty of generating optimized tensor programs …
Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU
AI applications powered by deep learning are increasingly running on edge devices.
Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the …
Meanwhile, many real-world IoT applications demand multiple real-time tasks to run on the …
FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers
P Mu, L Wei, Y Liu, R Wang - arXiv preprint arXiv:2407.21418, 2024 - arxiv.org
Many artificial intelligence models process input data of different lengths and resolutions,
making the shape of the tensors dynamic. The performance of these models depends on the …
making the shape of the tensors dynamic. The performance of these models depends on the …
Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness
Tensor program optimization on Deep Learning Accelerators (DLAs) is critical for efficient
model deployment. Although search-based Deep Learning Compilers (DLCs) have …
model deployment. Although search-based Deep Learning Compilers (DLCs) have …
[PDF][PDF] Optimizing On-Device Model Adaptation with Minimal Latency for Real-Time Personalized Applications
K Mehta, M Gupta, A Nair, S Desai, P Singh, T Peng… - 2024 - researchgate.net
Real-time personalized applications require optimized on-device model adaptation to
deliver immediate responses. This paper introduces a hybrid architecture that effectively …
deliver immediate responses. This paper introduces a hybrid architecture that effectively …