The multiberts: Bert reproductions for robustness analysis

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press

How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

被引用次数：600 相关文章所有 7 个版本

[PDF] neurips.cc

Deep reinforcement learning at the edge of the statistical precipice

R Agarwal, M Schwarzer, PS Castro… - Advances in neural …, 2021 - proceedings.neurips.cc

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing
their relative performance on a large suite of tasks. Most published results on deep RL …

被引用次数：578 相关文章所有 8 个版本

[PDF] mlr.press

A toy model of universality: Reverse engineering how networks learn group operations

B Chughtai, L Chan, N Nanda - International Conference on …, 2023 - proceedings.mlr.press

Universality is a key hypothesis in mechanistic interpretability–that different models learn
similar features and circuits when trained on similar tasks. In this work, we study the …

被引用次数：59 相关文章所有 6 个版本

[PDF] arxiv.org

A pretrainer's guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity

S Longpre, G Yauney, E Reif, K Lee, A Roberts… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretraining is the preliminary and fundamental step in developing capable language models
(LM). Despite this, pretraining data design is critically under-documented and often guided …

被引用次数：62 相关文章所有 3 个版本

[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

被引用次数：60 相关文章所有 7 个版本

[PDF] arxiv.org

Datamodels: Predicting predictions from training data

A Ilyas, SM Park, L Engstrom, G Leclerc… - arXiv preprint arXiv …, 2022 - arxiv.org

We present a conceptual framework, datamodeling, for analyzing the behavior of a model
class in terms of the training data. For any fixed" target" example $ x $, training set $ S $, and …

被引用次数：95 相关文章所有 3 个版本

[PDF] arxiv.org

Analyzing transformers in embedding space

G Dar, M Geva, A Gupta, J Berant - arXiv preprint arXiv:2209.02535, 2022 - arxiv.org

Understanding Transformer-based models has attracted significant attention, as they lie at
the heart of recent technological advances across machine learning. While most …

被引用次数：80 相关文章所有 8 个版本

[PDF] arxiv.org

AdaMix: Mixture-of-adaptations for parameter-efficient model tuning

Y Wang, S Agarwal, S Mukherjee, X Liu, J Gao… - arXiv preprint arXiv …, 2022 - arxiv.org

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks
requires updating hundreds of millions to billions of parameters, and storing a large copy of …

被引用次数：67 相关文章所有 8 个版本

[PDF] arxiv.org

All bark and no bite: Rogue dimensions in transformer language models obscure representational quality

W Timkey, M Van Schijndel - arXiv preprint arXiv:2109.04404, 2021 - arxiv.org

Similarity measures are a vital tool for understanding how language models represent and
process language. Standard representational similarity measures such as cosine similarity …

被引用次数：87 相关文章所有 5 个版本

Deep learning model based on expectation-confirmation theory to predict customer satisfaction in hospitality service

S Oh, H Ji, J Kim, E Park, AP del Pobil - Information Technology & Tourism, 2022 - Springer

Customer satisfaction is one of the most important measures in the hospitality industry.
Therefore, several psychological and cognitive theories have been utilized to provide …

被引用次数：64 相关文章所有 6 个版本