Advancements in accelerating deep neural network inference on aiot devices: A survey
The amalgamation of artificial intelligence with Internet of Things (AIoT) devices have seen a
rapid surge in growth, largely due to the effective implementation of deep neural network …
rapid surge in growth, largely due to the effective implementation of deep neural network …
Dynamic GPU energy optimization for machine learning training workloads
GPUs are widely used to accelerate the training of machine learning workloads. As modern
machine learning models become increasingly larger, they require a longer time to train …
machine learning models become increasingly larger, they require a longer time to train …
Optimizing inference performance of transformers on CPUs
The Transformer architecture revolutionized the field of natural language processing (NLP).
Transformers-based models (eg, BERT) power many important Web services, such as …
Transformers-based models (eg, BERT) power many important Web services, such as …
Energy-Efficient Online Scheduling of Transformer Inference Services on GPU Servers
Cloud service providers are deploying Transformer-based deep learning models on GPU
servers to support many online inference-as-a-service (IAAS) applications, given the …
servers to support many online inference-as-a-service (IAAS) applications, given the …
An Efficient Transformer Inference Engine on DSP
K Chen, H Su, C Liu, X Gong - … on Algorithms and Architectures for Parallel …, 2022 - Springer
The transformer is one of the most important algorithms in the Natural Language Processing
(NLP) field and widely used in computer vision recently. Due to the huge computation …
(NLP) field and widely used in computer vision recently. Due to the huge computation …