Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation encoders

C Xu, B Hu, Y Li, Y Zhang, Q Ju, T Xiao, J Zhu - arXiv preprint arXiv …, 2021 - arxiv.org
Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …

Learning target-aware vision transformers for real-time UAV tracking

S Li, X Yang, X Wang, D Zeng, H Ye… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In recent years, the field of unmanned aerial vehicle (UAV) tracking has grown rapidly,
finding numerous applications across various industries. While the discriminative correlation …

Selective knowledge distillation for neural machine translation

F Wang, J Yan, F Meng, J Zhou - arXiv preprint arXiv:2105.12967, 2021 - arxiv.org
Neural Machine Translation (NMT) models achieve state-of-the-art performance on many
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …

The low-resource double bind: An empirical study of pruning for low-resource machine translation

O Ahia, J Kreutzer, S Hooker - arXiv preprint arXiv:2110.03036, 2021 - arxiv.org
A" bigger is better" explosion in the number of parameters in deep neural networks has
made it increasingly challenging to make state-of-the-art networks accessible in compute …

SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages

A Mohammadshahi, V Nikoulina, A Berard… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, multilingual machine translation models have achieved promising
performance on low-resource language pairs by sharing information between similar …

Lightseq2: Accelerated training for transformer-based models on gpus

X Wang, Y Wei, Y Xiong, G Huang… - … Conference for High …, 2022 - ieeexplore.ieee.org
Transformer-based neural models are used in many AI applications. Training these models
is expensive, as it takes huge GPU resources and long duration. It is challenging because …

Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing

Z Gao, L Yang, Y Dai - IEEE Internet of Things Journal, 2022 - ieeexplore.ieee.org
In vehicular fog computing, task offloading enables mobile vehicles (MVs) to offer ultralow
latency services for computation-intensive tasks. Nevertheless, the edge server (ES) may …

Lightformer: Light-weight transformer using svd-based weight transfer and parameter sharing

X Lv, P Zhang, S Li, G Gan, Y Sun - Findings of the Association for …, 2023 - aclanthology.org
Transformer has become an important technique for natural language processing tasks with
great success. However, it usually requires huge storage space and computational cost …

ODE transformer: An ordinary differential equation-inspired model for sequence generation

B Li, Q Du, T Zhou, Y Jing, S Zhou, X Zeng… - arXiv preprint arXiv …, 2022 - arxiv.org
Residual networks are an Euler discretization of solutions to Ordinary Differential Equations
(ODE). This paper explores a deeper relationship between Transformer and numerical ODE …