Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

Biases in large language models: origins, inventory, and discussion

R Navigli, S Conia, B Ross - ACM Journal of Data and Information …, 2023 - dl.acm.org
In this article, we introduce and discuss the pervasive issue of bias in the large language
models that are currently at the core of mainstream approaches to Natural Language …

Data selection for language models via importance resampling

SM Xie, S Santurkar, T Ma… - Advances in Neural …, 2023 - proceedings.neurips.cc
Selecting a suitable pretraining dataset is crucial for both general-domain (eg, GPT-3) and
domain-specific (eg, Codex) language models (LMs). We formalize this problem as selecting …

Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Should you mask 15% in masked language modeling?

A Wettig, T Gao, Z Zhong, D Chen - arXiv preprint arXiv:2202.08005, 2022 - arxiv.org
Masked language models (MLMs) conventionally mask 15% of tokens due to the belief that
more masking would leave insufficient context to learn good representations; this masking …

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

M-flag: Medical vision-language pre-training with frozen language models and latent space geometry optimization

C Liu, S Cheng, C Chen, M Qiao, W Zhang… - … Conference on Medical …, 2023 - Springer
Medical vision-language models enable co-learning and integrating features from medical
imaging and clinical text. However, these models are not easy to train and the latent …