Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

Deep reinforcement learning at the edge of the statistical precipice

R Agarwal, M Schwarzer, PS Castro… - Advances in neural …, 2021 - proceedings.neurips.cc
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing
their relative performance on a large suite of tasks. Most published results on deep RL …

A toy model of universality: Reverse engineering how networks learn group operations

B Chughtai, L Chan, N Nanda - International Conference on …, 2023 - proceedings.mlr.press
Universality is a key hypothesis in mechanistic interpretability–that different models learn
similar features and circuits when trained on similar tasks. In this work, we study the …

A pretrainer's guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity

S Longpre, G Yauney, E Reif, K Lee, A Roberts… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretraining is the preliminary and fundamental step in developing capable language models
(LM). Despite this, pretraining data design is critically under-documented and often guided …

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Datamodels: Predicting predictions from training data

A Ilyas, SM Park, L Engstrom, G Leclerc… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a conceptual framework, datamodeling, for analyzing the behavior of a model
class in terms of the training data. For any fixed" target" example $ x $, training set $ S $, and …

Analyzing transformers in embedding space

G Dar, M Geva, A Gupta, J Berant - arXiv preprint arXiv:2209.02535, 2022 - arxiv.org
Understanding Transformer-based models has attracted significant attention, as they lie at
the heart of recent technological advances across machine learning. While most …

AdaMix: Mixture-of-adaptations for parameter-efficient model tuning

Y Wang, S Agarwal, S Mukherjee, X Liu, J Gao… - arXiv preprint arXiv …, 2022 - arxiv.org
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks
requires updating hundreds of millions to billions of parameters, and storing a large copy of …

All bark and no bite: Rogue dimensions in transformer language models obscure representational quality

W Timkey, M Van Schijndel - arXiv preprint arXiv:2109.04404, 2021 - arxiv.org
Similarity measures are a vital tool for understanding how language models represent and
process language. Standard representational similarity measures such as cosine similarity …

Deep learning model based on expectation-confirmation theory to predict customer satisfaction in hospitality service

S Oh, H Ji, J Kim, E Park, AP del Pobil - Information Technology & Tourism, 2022 - Springer
Customer satisfaction is one of the most important measures in the hospitality industry.
Therefore, several psychological and cognitive theories have been utilized to provide …