Ammus: A survey of transformer-based pretrained models in natural language processing
KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …
almost every NLP task. The evolution of these models started with GPT and BERT. These …
A systematic review of transformer-based pre-trained language models through self-supervised learning
E Kotei, R Thirunavukarasu - Information, 2023 - mdpi.com
Transfer learning is a technique utilized in deep learning applications to transmit learned
inference to a different target domain. The approach is mainly to solve the problem of a few …
inference to a different target domain. The approach is mainly to solve the problem of a few …
Memorization without overfitting: Analyzing the training dynamics of large language models
K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …
large language models is not well understood. We empirically study exact memorization in …
The multiberts: Bert reproductions for robustness analysis
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …
A closer look at how fine-tuning changes BERT
Y Zhou, V Srikumar - arXiv preprint arXiv:2106.14282, 2021 - arxiv.org
Given the prevalence of pre-trained contextualized representations in today's NLP, there
have been many efforts to understand what information they contain, and why they seem to …
have been many efforts to understand what information they contain, and why they seem to …
Analyzing how BERT performs entity matching
State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as
BERT, for generating highly contextualized embeddings of terms. The embeddings are then …
BERT, for generating highly contextualized embeddings of terms. The embeddings are then …
Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …
encompassing various applications and research areas such as robustness, security …
PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning
T Dopierre, C Gravier, W Logerais - arXiv preprint arXiv:2105.12995, 2021 - arxiv.org
Recent research considers few-shot intent detection as a meta-learning problem: the model
is learning to learn from a consecutive set of small tasks named episodes. In this work, we …
is learning to learn from a consecutive set of small tasks named episodes. In this work, we …
Prototypical fine-tuning: Towards robust performance under varying data sizes
In this paper, we move towards combining large parametric models with non-parametric
prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework …
prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework …
T3-vis: visual analytic for training and fine-tuning transformers in NLP
Transformers are the dominant architecture in NLP, but their training and fine-tuning is still
very challenging. In this paper, we present the design and implementation of a visual …
very challenging. In this paper, we present the design and implementation of a visual …