Energy-efficient inference service of transformer-based deep learning models on GPUS

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Dynamic GPU energy optimization for machine learning training workloads

F Wang, W Zhang, S Lai, M Hao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

GPUs are widely used to accelerate the training of machine learning workloads. As modern
machine learning models become increasingly larger, they require a longer time to train …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

Optimizing inference performance of transformers on CPUs

D Dice, A Kogan - arXiv preprint arXiv:2102.06621, 2021 - arxiv.org

The Transformer architecture revolutionized the field of natural language processing (NLP).
Transformers-based models (eg, BERT) power many important Web services, such as …

被引用次数：19 相关文章所有 4 个版本

[PDF] google.com

Energy-Efficient Online Scheduling of Transformer Inference Services on GPU Servers

Y Wang, Q Wang, X Chu - IEEE Transactions on Green …, 2022 - ieeexplore.ieee.org

Cloud service providers are deploying Transformer-based deep learning models on GPU
servers to support many online inference-as-a-service (IAAS) applications, given the …

被引用次数：2 相关文章所有 2 个版本

An Efficient Transformer Inference Engine on DSP

K Chen, H Su, C Liu, X Gong - … on Algorithms and Architectures for Parallel …, 2022 - Springer

The transformer is one of the most important algorithms in the Natural Language Processing
(NLP) field and widely used in computer vision recently. Due to the huge computation …