Investigating learning dynamics of BERT fine-tuning

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org

Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

被引用次数：358 相关文章所有 2 个版本

[PDF] mdpi.com

A systematic review of transformer-based pre-trained language models through self-supervised learning

E Kotei, R Thirunavukarasu - Information, 2023 - mdpi.com

Transfer learning is a technique utilized in deep learning applications to transmit learned
inference to a different target domain. The approach is mainly to solve the problem of a few …

被引用次数：53 相关文章所有 3 个版本

[PDF] neurips.cc

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc

Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

被引用次数：234 相关文章所有 5 个版本

[PDF] arxiv.org

The multiberts: Bert reproductions for robustness analysis

T Sellam, S Yadlowsky, J Wei, N Saphra… - arXiv preprint arXiv …, 2021 - arxiv.org

Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …

被引用次数：93 相关文章所有 7 个版本

[PDF] arxiv.org

A closer look at how fine-tuning changes BERT

Y Zhou, V Srikumar - arXiv preprint arXiv:2106.14282, 2021 - arxiv.org

Given the prevalence of pre-trained contextualized representations in today's NLP, there
have been many efforts to understand what information they contain, and why they seem to …

被引用次数：68 相关文章所有 7 个版本

[PDF] unimore.it

Analyzing how BERT performs entity matching

M Paganelli, F Del Buono, A Baraldi… - Proceedings of the …, 2022 - iris.unimore.it

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as
BERT, for generating highly contextualized embeddings of terms. The embeddings are then …

被引用次数：38 相关文章所有 7 个版本

[PDF] arxiv.org

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arXiv preprint arXiv:2307.16851, 2023 - arxiv.org

The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning

T Dopierre, C Gravier, W Logerais - arXiv preprint arXiv:2105.12995, 2021 - arxiv.org

Recent research considers few-shot intent detection as a meta-learning problem: the model
is learning to learn from a consecutive set of small tasks named episodes. In this work, we …

被引用次数：37 相关文章所有 4 个版本

[PDF] aaai.org

Prototypical fine-tuning: Towards robust performance under varying data sizes

Y Jin, X Wang, Y Hao, Y Sun, X Xie - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

In this paper, we move towards combining large parametric models with non-parametric
prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework …

被引用次数：11 相关文章所有 4 个版本

[PDF] aclanthology.org

T3-vis: visual analytic for training and fine-tuning transformers in NLP

R Li, W Xiao, L Wang, H Jang… - Proceedings of the 2021 …, 2021 - aclanthology.org

Transformers are the dominant architecture in NLP, but their training and fine-tuning is still
very challenging. In this paper, we present the design and implementation of a visual …

被引用次数：20 相关文章所有 3 个版本