Improving image captioning by leveraging intra-and inter-layer global representation in transformer network
Transformer-based architectures have shown great success in image captioning, where
object regions are encoded and then attended into the vectorial representations to guide the …
object regions are encoded and then attended into the vectorial representations to guide the …
Time-series anomaly detection with stacked Transformer representations and 1D convolutional network
Time-series anomaly detection is a task of detecting data that do not follow normal data
distribution among continuously collected data. It is used for system maintenance in various …
distribution among continuously collected data. It is used for system maintenance in various …
Bridgetower: Building bridges between encoders in vision-language representation learning
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-
language representation learning in recent years. Current VL models either use lightweight …
language representation learning in recent years. Current VL models either use lightweight …
Acquiring knowledge from pre-trained model to neural machine translation
Pre-training and fine-tuning have achieved great success in natural language process field.
The standard paradigm of exploiting them includes two steps: first, pre-training a model, eg …
The standard paradigm of exploiting them includes two steps: first, pre-training a model, eg …
Modeling recurrence for transformer
Recently, the Transformer model that is based solely on attention mechanisms, has
advanced the state-of-the-art on various machine translation tasks. However, recent studies …
advanced the state-of-the-art on various machine translation tasks. However, recent studies …
Self-attention with structural position representations
Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP
tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al …
tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al …
Shallow-to-deep training for neural machine translation
Deep encoders have been proven to be effective in improving neural machine translation
(NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …
(NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …
Understanding and improving encoder layer fusion in sequence-to-sequence learning
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead
of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven …
of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven …
JoinER-BART: joint entity and relation extraction with constrained decoding, representation reuse and fusion
H Chang, H Xu, J van Genabith… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Joint Entity and Relation Extraction (JERE) is an important research direction in Information
Extraction (IE). Given the surprising performance with fine-tuning of pre-trained BERT in a …
Extraction (IE). Given the surprising performance with fine-tuning of pre-trained BERT in a …
Dynamic layer aggregation for neural machine translation with routing-by-agreement
With the promising progress of deep neural networks, layer aggregation has been used to
fuse information across layers in various fields, such as computer vision and machine …
fuse information across layers in various fields, such as computer vision and machine …