Never train from scratch: Fair comparison of long-sequence models requires data-driven priors

I Amos, J Berant, A Gupta - arXiv preprint arXiv:2310.02980, 2023 - arxiv.org
Modeling long-range dependencies across sequences is a longstanding goal in machine
learning and has led to architectures, such as state space models, that dramatically …

Simplifying and understanding state space models with diagonal linear rnns

A Gupta, H Mehta, J Berant - arXiv preprint arXiv:2212.00768, 2022 - arxiv.org
Sequence models based on linear state spaces (SSMs) have recently emerged as a
promising choice of architecture for modeling long range dependencies across various …

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

On the role of parallel data in cross-lingual transfer learning

M Reid, M Artetxe - arXiv preprint arXiv:2212.10173, 2022 - arxiv.org
While prior work has established that the use of parallel data is conducive for cross-lingual
learning, it is unclear if the improvements come from the data itself, or if it is the modeling of …

Pivotal role of language modeling in recommender systems: Enriching task-specific and task-agnostic representation learning

K Shin, H Kwak, W Kim, J Jeong, S Jung… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent studies have proposed unified user modeling frameworks that leverage user
behavior data from various applications. Many of them benefit from utilizing users' behavior …

Improving speaker verification with self-pretrained transformer models

J Peng, O Plchot, T Stafylakis, L Mošner… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, fine-tuning large pre-trained Transformer models using downstream datasets has
received a rising interest. Despite their success, it is still challenging to disentangle the …

Benchmarking Middle-Trained Language Models for Neural Search

H Déjean, S Clinchant, C Lassance, S Lupart… - Proceedings of the 46th …, 2023 - dl.acm.org
Middle training methods aim to bridge the gap between the Masked Language Model (MLM)
pre-training and the final finetuning for retrieval. Recent models such as CoCondenser …

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

A Alajrami, K Margatina, N Aletras - arXiv preprint arXiv:2310.17271, 2023 - arxiv.org
Understanding how and what pre-trained language models (PLMs) learn about language is
an open challenge in natural language processing. Previous work has focused on …

How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems

C Van Uden, J Irvin, M Huang, N Dean, J Carr… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) enables label efficient training for machine learning models.
This is essential for domains such as medical imaging, where labels are costly and time …

Efficient induction of language models via probabilistic concept formation

CJ MacLellan, P Matsakis, P Langley - arXiv preprint arXiv:2212.11937, 2022 - arxiv.org
This paper presents a novel approach to the acquisition of language models from corpora.
The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of …