Never train from scratch: Fair comparison of long-sequence models requires data-driven priors
Modeling long-range dependencies across sequences is a longstanding goal in machine
learning and has led to architectures, such as state space models, that dramatically …
learning and has led to architectures, such as state space models, that dramatically …
Simplifying and understanding state space models with diagonal linear rnns
Sequence models based on linear state spaces (SSMs) have recently emerged as a
promising choice of architecture for modeling long range dependencies across various …
promising choice of architecture for modeling long range dependencies across various …
Zamba: A Compact 7B SSM Hybrid Model
P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …
achieves competitive performance against leading open-weight models at a comparable …
On the role of parallel data in cross-lingual transfer learning
While prior work has established that the use of parallel data is conducive for cross-lingual
learning, it is unclear if the improvements come from the data itself, or if it is the modeling of …
learning, it is unclear if the improvements come from the data itself, or if it is the modeling of …
Pivotal role of language modeling in recommender systems: Enriching task-specific and task-agnostic representation learning
Recent studies have proposed unified user modeling frameworks that leverage user
behavior data from various applications. Many of them benefit from utilizing users' behavior …
behavior data from various applications. Many of them benefit from utilizing users' behavior …
Improving speaker verification with self-pretrained transformer models
Recently, fine-tuning large pre-trained Transformer models using downstream datasets has
received a rising interest. Despite their success, it is still challenging to disentangle the …
received a rising interest. Despite their success, it is still challenging to disentangle the …
Benchmarking Middle-Trained Language Models for Neural Search
Middle training methods aim to bridge the gap between the Masked Language Model (MLM)
pre-training and the final finetuning for retrieval. Recent models such as CoCondenser …
pre-training and the final finetuning for retrieval. Recent models such as CoCondenser …
Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?
Understanding how and what pre-trained language models (PLMs) learn about language is
an open challenge in natural language processing. Previous work has focused on …
an open challenge in natural language processing. Previous work has focused on …
How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems
Self-supervised learning (SSL) enables label efficient training for machine learning models.
This is essential for domains such as medical imaging, where labels are costly and time …
This is essential for domains such as medical imaging, where labels are costly and time …
Efficient induction of language models via probabilistic concept formation
CJ MacLellan, P Matsakis, P Langley - arXiv preprint arXiv:2212.11937, 2022 - arxiv.org
This paper presents a novel approach to the acquisition of language models from corpora.
The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of …
The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of …