Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

A systematic review of transformer-based pre-trained language models through self-supervised learning

E Kotei, R Thirunavukarasu - Information, 2023 - mdpi.com
Transfer learning is a technique utilized in deep learning applications to transmit learned
inference to a different target domain. The approach is mainly to solve the problem of a few …

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

The multiberts: Bert reproductions for robustness analysis

T Sellam, S Yadlowsky, J Wei, N Saphra… - arXiv preprint arXiv …, 2021 - arxiv.org
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …

A closer look at how fine-tuning changes BERT

Y Zhou, V Srikumar - arXiv preprint arXiv:2106.14282, 2021 - arxiv.org
Given the prevalence of pre-trained contextualized representations in today's NLP, there
have been many efforts to understand what information they contain, and why they seem to …

Analyzing how BERT performs entity matching

M Paganelli, F Del Buono, A Baraldi… - Proceedings of the …, 2022 - iris.unimore.it
State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as
BERT, for generating highly contextualized embeddings of terms. The embeddings are then …

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arXiv preprint arXiv:2307.16851, 2023 - arxiv.org
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning

T Dopierre, C Gravier, W Logerais - arXiv preprint arXiv:2105.12995, 2021 - arxiv.org
Recent research considers few-shot intent detection as a meta-learning problem: the model
is learning to learn from a consecutive set of small tasks named episodes. In this work, we …

Prototypical fine-tuning: Towards robust performance under varying data sizes

Y Jin, X Wang, Y Hao, Y Sun, X Xie - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
In this paper, we move towards combining large parametric models with non-parametric
prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework …

T3-vis: visual analytic for training and fine-tuning transformers in NLP

R Li, W Xiao, L Wang, H Jang… - Proceedings of the 2021 …, 2021 - aclanthology.org
Transformers are the dominant architecture in NLP, but their training and fine-tuning is still
very challenging. In this paper, we present the design and implementation of a visual …