Knowledge distillation: A survey
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …
especially for computer vision tasks. The great success of deep learning is mainly due to its …
Stacked acoustic-and-textual encoding: Integrating the pre-trained models into speech translation encoders
Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …
speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic …
Learning target-aware vision transformers for real-time UAV tracking
In recent years, the field of unmanned aerial vehicle (UAV) tracking has grown rapidly,
finding numerous applications across various industries. While the discriminative correlation …
finding numerous applications across various industries. While the discriminative correlation …
Selective knowledge distillation for neural machine translation
Neural Machine Translation (NMT) models achieve state-of-the-art performance on many
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …
The low-resource double bind: An empirical study of pruning for low-resource machine translation
A" bigger is better" explosion in the number of parameters in deep neural networks has
made it increasingly challenging to make state-of-the-art networks accessible in compute …
made it increasingly challenging to make state-of-the-art networks accessible in compute …
SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages
In recent years, multilingual machine translation models have achieved promising
performance on low-resource language pairs by sharing information between similar …
performance on low-resource language pairs by sharing information between similar …
Lightseq2: Accelerated training for transformer-based models on gpus
Transformer-based neural models are used in many AI applications. Training these models
is expensive, as it takes huge GPU resources and long duration. It is challenging because …
is expensive, as it takes huge GPU resources and long duration. It is challenging because …
Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing
Z Gao, L Yang, Y Dai - IEEE Internet of Things Journal, 2022 - ieeexplore.ieee.org
In vehicular fog computing, task offloading enables mobile vehicles (MVs) to offer ultralow
latency services for computation-intensive tasks. Nevertheless, the edge server (ES) may …
latency services for computation-intensive tasks. Nevertheless, the edge server (ES) may …
Lightformer: Light-weight transformer using svd-based weight transfer and parameter sharing
Transformer has become an important technique for natural language processing tasks with
great success. However, it usually requires huge storage space and computational cost …
great success. However, it usually requires huge storage space and computational cost …
ODE transformer: An ordinary differential equation-inspired model for sequence generation
Residual networks are an Euler discretization of solutions to Ordinary Differential Equations
(ODE). This paper explores a deeper relationship between Transformer and numerical ODE …
(ODE). This paper explores a deeper relationship between Transformer and numerical ODE …