Language models scale reliably with over-training and on downstream tasks

SY Gadre, G Smyrnis, V Shankar, S Gururangan… - arXiv preprint arXiv …, 2024 - arxiv.org
Scaling laws are useful guides for developing language models, but there are still gaps
between current scaling studies and how language models are ultimately trained and …

Rho-1: Not all tokens are what you need

Z Lin, Z Gou, Y Gong, X Liu, Y Shen, R Xu, C Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that" Not all tokens in a …

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

C Tao, Q Liu, L Dou, N Muennighoff, Z Wan… - arXiv preprint arXiv …, 2024 - arxiv.org
Research on scaling large language models (LLMs) has primarily focused on model
parameters and training data size, overlooking the role of vocabulary size.% Intuitively …

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

L Zhang, J Wu, D Zhou, G Xu - arXiv preprint arXiv:2403.01165, 2024 - arxiv.org
Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-
shot learning through prompting methods, supervised training is still necessary for complex …

Unraveling the mystery of scaling laws: Part i

H Su, Z Tian, X Shen, X Cai - arXiv preprint arXiv:2403.06563, 2024 - arxiv.org
Scaling law principles indicate a power-law correlation between loss and variables such as
model size, dataset size, and computational resources utilized during training. These …

Neural Scaling Laws for Embodied AI

S Sartor, N Thompson - arXiv preprint arXiv:2405.14005, 2024 - arxiv.org
Scaling laws have driven remarkable progress across machine learning domains like
language modeling and computer vision. However, the exploration of scaling laws in …

Scaling Laws for Linear Complexity Language Models

X Shen, D Li, R Leng, Z Qin, W Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
The interest in linear complexity models for large language models is on the rise, although
their scaling capacity remains uncertain. In this study, we present the scaling laws for linear …

Collaborative Performance Prediction for Large Language Models

Q Zhang, F Lyu, X Liu, C Ma - arXiv preprint arXiv:2407.01300, 2024 - arxiv.org
Comprehensively understanding and accurately predicting the performance of large
language models across diverse downstream tasks has emerged as a pivotal challenge in …

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

W Zhang, K Saijo, J Jung, C Li, S Watanabe… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep learning-based speech enhancement (SE) models have achieved impressive
performance in the past decade. Numerous advanced architectures have been designed to …

[PDF][PDF] OnResource Efficient Transfer Learning via End Task Aware Training

LM Dery - 2024 - kilthub.cmu.edu
Transfer learning is a machine learning (ML) paradigm where performance on a desired end
task 1 is improved by exploiting” knowledge” from other tasks. The technique has become a …