Advancements in accelerating deep neural network inference on aiot devices: A survey

L Cheng, Y Gu, Q Liu, L Yang, C Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …

Dynamic GPU energy optimization for machine learning training workloads

F Wang, W Zhang, S Lai, M Hao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
GPUs are widely used to accelerate the training of machine learning workloads. As modern
machine learning models become increasingly larger, they require a longer time to train …

Optimizing inference performance of transformers on CPUs

D Dice, A Kogan - arXiv preprint arXiv:2102.06621, 2021 - arxiv.org
The Transformer architecture revolutionized the field of natural language processing (NLP).
Transformers-based models (eg, BERT) power many important Web services, such as …

Energy-Efficient Online Scheduling of Transformer Inference Services on GPU Servers

Y Wang, Q Wang, X Chu - IEEE Transactions on Green …, 2022 - ieeexplore.ieee.org
Cloud service providers are deploying Transformer-based deep learning models on GPU
servers to support many online inference-as-a-service (IAAS) applications, given the …

An Efficient Transformer Inference Engine on DSP

K Chen, H Su, C Liu, X Gong - … on Algorithms and Architectures for Parallel …, 2022 - Springer
The transformer is one of the most important algorithms in the Natural Language Processing
(NLP) field and widely used in computer vision recently. Due to the huge computation …