Learning light-weight translation models from deep transformer

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：3170 相关文章所有 12 个版本

[PDF] arxiv.org

Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation encoders

C Xu, B Hu, Y Li, Y Zhang, Q Ju, T Xiao, J Zhu - arXiv preprint arXiv …, 2021 - arxiv.org

Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …

被引用次数：66 相关文章所有 5 个版本

Learning target-aware vision transformers for real-time UAV tracking

S Li, X Yang, X Wang, D Zeng, H Ye… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In recent years, the field of unmanned aerial vehicle (UAV) tracking has grown rapidly,
finding numerous applications across various industries. While the discriminative correlation …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Selective knowledge distillation for neural machine translation

F Wang, J Yan, F Meng, J Zhou - arXiv preprint arXiv:2105.12967, 2021 - arxiv.org

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …

被引用次数：56 相关文章所有 7 个版本

[PDF] arxiv.org

The low-resource double bind: An empirical study of pruning for low-resource machine translation

O Ahia, J Kreutzer, S Hooker - arXiv preprint arXiv:2110.03036, 2021 - arxiv.org

A" bigger is better" explosion in the number of parameters in deep neural networks has
made it increasingly challenging to make state-of-the-art networks accessible in compute …

被引用次数：51 相关文章所有 7 个版本

[PDF] arxiv.org

SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages

A Mohammadshahi, V Nikoulina, A Berard… - arXiv preprint arXiv …, 2022 - arxiv.org

In recent years, multilingual machine translation models have achieved promising
performance on low-resource language pairs by sharing information between similar …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Lightseq2: Accelerated training for transformer-based models on gpus

X Wang, Y Wei, Y Xiong, G Huang… - … Conference for High …, 2022 - ieeexplore.ieee.org

Transformer-based neural models are used in many AI applications. Training these models
is expensive, as it takes huge GPU resources and long duration. It is challenging because …

被引用次数：38 相关文章所有 10 个版本

Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing

Z Gao, L Yang, Y Dai - IEEE Internet of Things Journal, 2022 - ieeexplore.ieee.org

In vehicular fog computing, task offloading enables mobile vehicles (MVs) to offer ultralow
latency services for computation-intensive tasks. Nevertheless, the edge server (ES) may …

被引用次数：22 相关文章

[PDF] aclanthology.org

Lightformer: Light-weight transformer using svd-based weight transfer and parameter sharing

X Lv, P Zhang, S Li, G Gan, Y Sun - Findings of the Association for …, 2023 - aclanthology.org

Transformer has become an important technique for natural language processing tasks with
great success. However, it usually requires huge storage space and computational cost …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

ODE transformer: An ordinary differential equation-inspired model for sequence generation

B Li, Q Du, T Zhou, Y Jing, S Zhou, X Zeng… - arXiv preprint arXiv …, 2022 - arxiv.org

Residual networks are an Euler discretization of solutions to Ordinary Differential Equations
(ODE). This paper explores a deeper relationship between Transformer and numerical ODE …

被引用次数：24 相关文章所有 4 个版本