Language model behavior: A comprehensive survey
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
Memorization without overfitting: Analyzing the training dynamics of large language models
K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …
large language models is not well understood. We empirically study exact memorization in …
The life cycle of knowledge in big language models: A survey
Abstract Knowledge plays a critical role in artificial intelligence. Recently, the extensive
success of pre-trained language models (PLMs) has raised significant attention about how …
success of pre-trained language models (PLMs) has raised significant attention about how …
When do you need billions of words of pretraining data?
NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …
which achieve strong performance on NLU tasks through pretraining on billions of words …
The multiberts: Bert reproductions for robustness analysis
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …
Probing across time: What does RoBERTa know and when?
Models of language trained on very large corpora have been demonstrated useful for NLP.
As fixed artifacts, they have become the object of intense study, with many researchers" …
As fixed artifacts, they have become the object of intense study, with many researchers" …
Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filtering
Natural language processing for finance has gained significant attention from both
academia and the industry as the continuously increasing amount of financial texts has …
academia and the industry as the continuously increasing amount of financial texts has …
IIITT@ LT-EDI-EACL2021-hope speech detection: there is always hope in transformers
In a world filled with serious challenges like climate change, religious and political conflicts,
global pandemics, terrorism, and racial discrimination, an internet full of hate speech …
global pandemics, terrorism, and racial discrimination, an internet full of hate speech …
Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization
Like most natural language understanding and generation tasks, state-of-the-art models for
summarization are transformer-based sequence-to-sequence architectures that are …
summarization are transformer-based sequence-to-sequence architectures that are …
Give me the facts! a survey on factual knowledge probing in pre-trained language models
P Youssef, OA Koraş, M Li, J Schlötterer… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world
knowledge. This fact has sparked the interest of the community in quantifying the amount of …
knowledge. This fact has sparked the interest of the community in quantifying the amount of …