Pythia: A suite for analyzing large language models across training and scaling
S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …
How do these patterns change as models scale? To answer these questions, we introduce …
Deep reinforcement learning at the edge of the statistical precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing
their relative performance on a large suite of tasks. Most published results on deep RL …
their relative performance on a large suite of tasks. Most published results on deep RL …
A toy model of universality: Reverse engineering how networks learn group operations
Universality is a key hypothesis in mechanistic interpretability–that different models learn
similar features and circuits when trained on similar tasks. In this work, we study the …
similar features and circuits when trained on similar tasks. In this work, we study the …
A pretrainer's guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity
Pretraining is the preliminary and fundamental step in developing capable language models
(LM). Despite this, pretraining data design is critically under-documented and often guided …
(LM). Despite this, pretraining data design is critically under-documented and often guided …
Cramming: Training a Language Model on a single GPU in one day.
J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …
scaling, and have resulted in an environment where training language models is out of …
Datamodels: Predicting predictions from training data
We present a conceptual framework, datamodeling, for analyzing the behavior of a model
class in terms of the training data. For any fixed" target" example $ x $, training set $ S $, and …
class in terms of the training data. For any fixed" target" example $ x $, training set $ S $, and …
Analyzing transformers in embedding space
Understanding Transformer-based models has attracted significant attention, as they lie at
the heart of recent technological advances across machine learning. While most …
the heart of recent technological advances across machine learning. While most …
AdaMix: Mixture-of-adaptations for parameter-efficient model tuning
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks
requires updating hundreds of millions to billions of parameters, and storing a large copy of …
requires updating hundreds of millions to billions of parameters, and storing a large copy of …
All bark and no bite: Rogue dimensions in transformer language models obscure representational quality
W Timkey, M Van Schijndel - arXiv preprint arXiv:2109.04404, 2021 - arxiv.org
Similarity measures are a vital tool for understanding how language models represent and
process language. Standard representational similarity measures such as cosine similarity …
process language. Standard representational similarity measures such as cosine similarity …
Deep learning model based on expectation-confirmation theory to predict customer satisfaction in hospitality service
Customer satisfaction is one of the most important measures in the hospitality industry.
Therefore, several psychological and cognitive theories have been utilized to provide …
Therefore, several psychological and cognitive theories have been utilized to provide …