Pretrained language model embryology: The birth of ALBERT

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu

Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

被引用次数：52 相关文章所有 7 个版本

[PDF] neurips.cc

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc

Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

被引用次数：162 相关文章所有 5 个版本

[PDF] arxiv.org

The life cycle of knowledge in big language models: A survey

B Cao, H Lin, X Han, L Sun - Machine Intelligence Research, 2024 - Springer

Abstract Knowledge plays a critical role in artificial intelligence. Recently, the extensive
success of pre-trained language models (PLMs) has raised significant attention about how …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

When do you need billions of words of pretraining data?

Y Zhang, A Warstadt, HS Li, SR Bowman - arXiv preprint arXiv:2011.04946, 2020 - arxiv.org

NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …

被引用次数：126 相关文章所有 6 个版本

[PDF] arxiv.org

The multiberts: Bert reproductions for robustness analysis

T Sellam, S Yadlowsky, J Wei, N Saphra… - arXiv preprint arXiv …, 2021 - arxiv.org

Experiments with pre-trained models such as BERT are often based on a single checkpoint.
While the conclusions drawn apply to the artifact tested in the experiment (ie, the particular …

被引用次数：81 相关文章所有 7 个版本

[PDF] arxiv.org

Probing across time: What does RoBERTa know and when?

LZ Liu, Y Wang, J Kasai, H Hajishirzi… - arXiv preprint arXiv …, 2021 - arxiv.org

Models of language trained on very large corpora have been demonstrated useful for NLP.
As fixed artifacts, they have become the object of intense study, with many researchers" …

被引用次数：67 相关文章所有 5 个版本

Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filtering

J Park, S Cho - Expert Systems with Applications, 2023 - Elsevier

Natural language processing for finance has gained significant attention from both
academia and the industry as the continuously increasing amount of financial texts has …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

IIITT@ LT-EDI-EACL2021-hope speech detection: there is always hope in transformers

K Puranik, A Hande, R Priyadharshini… - arXiv preprint arXiv …, 2021 - arxiv.org

In a world filled with serious challenges like climate change, religious and political conflicts,
global pandemics, terrorism, and racial discrimination, an internet full of hate speech …

被引用次数：54 相关文章所有 3 个版本

[PDF] arxiv.org

Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization

MK Eddine, N Tomeh, N Habash, JL Roux… - arXiv preprint arXiv …, 2022 - arxiv.org

Like most natural language understanding and generation tasks, state-of-the-art models for
summarization are transformer-based sequence-to-sequence architectures that are …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

Give me the facts! a survey on factual knowledge probing in pre-trained language models

P Youssef, OA Koraş, M Li, J Schlötterer… - arXiv preprint arXiv …, 2023 - arxiv.org

Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world
knowledge. This fact has sparked the interest of the community in quantifying the amount of …

被引用次数：8 相关文章所有 4 个版本