Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

The life cycle of knowledge in big language models: A survey

B Cao, H Lin, X Han, L Sun - Machine Intelligence Research, 2024 - Springer
Abstract Knowledge plays a critical role in artificial intelligence. Recently, the extensive
success of pre-trained language models (PLMs) has raised significant attention about how …

When do you need billions of words of pretraining data?

Y Zhang, A Warstadt, HS Li, SR Bowman - arXiv preprint arXiv:2011.04946, 2020 - arxiv.org
NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …

The multiberts: Bert reproductions for robustness analysis

T Sellam, S Yadlowsky, J Wei, N Saphra… - arXiv preprint arXiv …, 2021 - arxiv.org
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …

Probing across time: What does RoBERTa know and when?

LZ Liu, Y Wang, J Kasai, H Hajishirzi… - arXiv preprint arXiv …, 2021 - arxiv.org
Models of language trained on very large corpora have been demonstrated useful for NLP.
As fixed artifacts, they have become the object of intense study, with many researchers" …

Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filtering

J Park, S Cho - Expert Systems with Applications, 2023 - Elsevier
Natural language processing for finance has gained significant attention from both
academia and the industry as the continuously increasing amount of financial texts has …

IIITT@ LT-EDI-EACL2021-hope speech detection: there is always hope in transformers

K Puranik, A Hande, R Priyadharshini… - arXiv preprint arXiv …, 2021 - arxiv.org
In a world filled with serious challenges like climate change, religious and political conflicts,
global pandemics, terrorism, and racial discrimination, an internet full of hate speech …

Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization

MK Eddine, N Tomeh, N Habash, JL Roux… - arXiv preprint arXiv …, 2022 - arxiv.org
Like most natural language understanding and generation tasks, state-of-the-art models for
summarization are transformer-based sequence-to-sequence architectures that are …

Give me the facts! a survey on factual knowledge probing in pre-trained language models

P Youssef, OA Koraş, M Li, J Schlötterer… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world
knowledge. This fact has sparked the interest of the community in quantifying the amount of …