Improving image captioning by leveraging intra-and inter-layer global representation in transformer network

J Ji, Y Luo, X Sun, F Chen, G Luo, Y Wu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Transformer-based architectures have shown great success in image captioning, where
object regions are encoded and then attended into the vectorial representations to guide the …

Time-series anomaly detection with stacked Transformer representations and 1D convolutional network

J Kim, H Kang, P Kang - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Time-series anomaly detection is a task of detecting data that do not follow normal data
distribution among continuously collected data. It is used for system maintenance in various …

Bridgetower: Building bridges between encoders in vision-language representation learning

X Xu, C Wu, S Rosenman, V Lal, W Che… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-
language representation learning in recent years. Current VL models either use lightweight …

Acquiring knowledge from pre-trained model to neural machine translation

R Weng, H Yu, S Huang, S Cheng, W Luo - Proceedings of the AAAI …, 2020 - ojs.aaai.org
Pre-training and fine-tuning have achieved great success in natural language process field.
The standard paradigm of exploiting them includes two steps: first, pre-training a model, eg …

Modeling recurrence for transformer

J Hao, X Wang, B Yang, L Wang, J Zhang… - arXiv preprint arXiv …, 2019 - arxiv.org
Recently, the Transformer model that is based solely on attention mechanisms, has
advanced the state-of-the-art on various machine translation tasks. However, recent studies …

Self-attention with structural position representations

X Wang, Z Tu, L Wang, S Shi - arXiv preprint arXiv:1909.00383, 2019 - arxiv.org
Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP
tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al …

Shallow-to-deep training for neural machine translation

B Li, Z Wang, H Liu, Y Jiang, Q Du, T Xiao… - arXiv preprint arXiv …, 2020 - arxiv.org
Deep encoders have been proven to be effective in improving neural machine translation
(NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …

Understanding and improving encoder layer fusion in sequence-to-sequence learning

X Liu, L Wang, DF Wong, L Ding, LS Chao… - arXiv preprint arXiv …, 2020 - arxiv.org
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead
of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven …

JoinER-BART: joint entity and relation extraction with constrained decoding, representation reuse and fusion

H Chang, H Xu, J van Genabith… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Joint Entity and Relation Extraction (JERE) is an important research direction in Information
Extraction (IE). Given the surprising performance with fine-tuning of pre-trained BERT in a …

Dynamic layer aggregation for neural machine translation with routing-by-agreement

ZY Dou, Z Tu, X Wang, L Wang, S Shi… - Proceedings of the AAAI …, 2019 - ojs.aaai.org
With the promising progress of deep neural networks, layer aggregation has been used to
fuse information across layers in various fields, such as computer vision and machine …