Deep learning on mobile and embedded devices: State-of-the-art, challenges, and future directions

Y Chen, B Zheng, Z Zhang, Q Wang, C Shen… - ACM Computing …, 2020 - dl.acm.org
Recent years have witnessed an exponential increase in the use of mobile and embedded
devices. With the great success of deep learning in many fields, there is an emerging trend …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

YOLOv4-5D: An effective and efficient object detector for autonomous driving

Y Cai, T Luan, H Gao, H Wang, L Chen… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
The use of object detection algorithms has become extremely important in autonomous
vehicles. Object detection at high accuracy and a fast inference speed is essential for safe …

Hrank: Filter pruning using high-rank feature map

M Lin, R Ji, Y Wang, Y Zhang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Neural network pruning offers a promising prospect to facilitate deploying deep neural
networks on resource-limited devices. However, existing methods are still challenged by the …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Similarity-preserving knowledge distillation

F Tung, G Mori - Proceedings of the IEEE/CVF international …, 2019 - openaccess.thecvf.com
Abstract Knowledge distillation is a widely applicable technique for training a student neural
network under the guidance of a trained teacher network. For example, in neural network …

Towards optimal structured cnn pruning via generative adversarial learning

S Lin, R Ji, C Yan, B Zhang, L Cao… - Proceedings of the …, 2019 - openaccess.thecvf.com
Structured pruning of filters or neurons has received increased focus for compressing
convolutional neural networks. Most existing methods rely on multi-stage optimizations in a …

Comparing rewinding and fine-tuning in neural network pruning

A Renda, J Frankle, M Carbin - arXiv preprint arXiv:2003.02389, 2020 - arxiv.org
Many neural network pruning algorithms proceed in three steps: train the network to
completion, remove unwanted structure to compress the network, and retrain the remaining …

Distilling object detectors with fine-grained feature imitation

T Wang, L Yuan, X Zhang… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
State-of-the-art CNN based recognition models are often computationally prohibitive to
deploy on low-end devices. A promising high level approach tackling this limitation is …

Terngrad: Ternary gradients to reduce communication in distributed deep learning

W Wen, C Xu, F Yan, C Wu, Y Wang… - Advances in neural …, 2017 - proceedings.neurips.cc
High network communication cost for synchronizing gradients and parameters is the well-
known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary …