Position information in transformers: An overview
Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …
research. By definition, a Transformer is invariant with respect to reordering of the input …
The impact of positional encoding on length generalization in transformers
A Kazemnejad, I Padhi… - Advances in …, 2024 - proceedings.neurips.cc
Length generalization, the ability to generalize from small training context sizes to larger
ones, is a critical challenge in the development of Transformer-based language models …
ones, is a critical challenge in the development of Transformer-based language models …
Theoretical limitations of self-attention in neural sequence models
M Hahn - Transactions of the Association for Computational …, 2020 - direct.mit.edu
Transformers are emerging as the new workhorse of NLP, showing great success across
tasks. Unlike LSTMs, transformers process input sequences entirely through self-attention …
tasks. Unlike LSTMs, transformers process input sequences entirely through self-attention …
On the ability and limitations of transformers to recognize formal languages
Transformers have supplanted recurrent models in a large number of NLP tasks. However,
the differences in their abilities to model different syntactic properties remain largely …
the differences in their abilities to model different syntactic properties remain largely …
Self-attention networks can process bounded hierarchical languages
Despite their impressive performance in NLP, self-attention networks were recently proved
to be limited for processing formal languages with hierarchical structure, such as $\mathsf …
to be limited for processing formal languages with hierarchical structure, such as $\mathsf …
Uncertainty-aware curriculum learning for neural machine translation
Neural machine translation (NMT) has proven to be facilitated by curriculum learning which
presents examples in an easy-to-hard order at different training stages. The keys lie in the …
presents examples in an easy-to-hard order at different training stages. The keys lie in the …
Self-attention with cross-lingual position representation
Position encoding (PE), an essential part of self-attention networks (SANs), is used to
preserve the word order information for natural language processing tasks, generating fixed …
preserve the word order information for natural language processing tasks, generating fixed …
How effective is BERT without word ordering? implications for language understanding and data privacy
J Hessel, A Schofield - Proceedings of the 59th Annual Meeting of …, 2021 - aclanthology.org
Ordered word sequences contain the rich structures that define language. However, it's often
not clear if or how modern pretrained language models utilize these structures. We show …
not clear if or how modern pretrained language models utilize these structures. We show …
On the computational power of transformers and its implications in sequence modeling
Transformers are being used extensively across several sequence modeling tasks.
Significant research effort has been devoted to experimentally probe the inner workings of …
Significant research effort has been devoted to experimentally probe the inner workings of …
Rethinking the value of transformer components
Transformer becomes the state-of-the-art translation model, while it is not well studied how
each intermediate component contributes to the model performance, which poses significant …
each intermediate component contributes to the model performance, which poses significant …