Downstream datasets make surprisingly good pretraining corpora

I Amos, J Berant, A Gupta - arXiv preprint arXiv:2310.02980, 2023 - arxiv.org

Modeling long-range dependencies across sequences is a longstanding goal in machine
learning and has led to architectures, such as state space models, that dramatically …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Simplifying and understanding state space models with diagonal linear rnns

A Gupta, H Mehta, J Berant - arXiv preprint arXiv:2212.00768, 2022 - arxiv.org

Sequence models based on linear state spaces (SSMs) have recently emerged as a
promising choice of architecture for modeling long range dependencies across various …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

On the role of parallel data in cross-lingual transfer learning

M Reid, M Artetxe - arXiv preprint arXiv:2212.10173, 2022 - arxiv.org

While prior work has established that the use of parallel data is conducive for cross-lingual
learning, it is unclear if the improvements come from the data itself, or if it is the modeling of …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Pivotal role of language modeling in recommender systems: Enriching task-specific and task-agnostic representation learning

K Shin, H Kwak, W Kim, J Jeong, S Jung… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent studies have proposed unified user modeling frameworks that leverage user
behavior data from various applications. Many of them benefit from utilizing users' behavior …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Improving speaker verification with self-pretrained transformer models

J Peng, O Plchot, T Stafylakis, L Mošner… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, fine-tuning large pre-trained Transformer models using downstream datasets has
received a rising interest. Despite their success, it is still challenging to disentangle the …

被引用次数：8 相关文章所有 8 个版本

[PDF] acm.org

Benchmarking Middle-Trained Language Models for Neural Search

H Déjean, S Clinchant, C Lassance, S Lupart… - Proceedings of the 46th …, 2023 - dl.acm.org

Middle training methods aim to bridge the gap between the Masked Language Model (MLM)
pre-training and the final finetuning for retrieval. Recent models such as CoCondenser …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

A Alajrami, K Margatina, N Aletras - arXiv preprint arXiv:2310.17271, 2023 - arxiv.org

Understanding how and what pre-trained language models (PLMs) learn about language is
an open challenge in natural language processing. Previous work has focused on …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems

C Van Uden, J Irvin, M Huang, N Dean, J Carr… - arXiv preprint arXiv …, 2023 - arxiv.org

Self-supervised learning (SSL) enables label efficient training for machine learning models.
This is essential for domains such as medical imaging, where labels are costly and time …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Efficient induction of language models via probabilistic concept formation

CJ MacLellan, P Matsakis, P Langley - arXiv preprint arXiv:2212.11937, 2022 - arxiv.org

This paper presents a novel approach to the acquisition of language models from corpora.
The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of …

被引用次数：3 相关文章所有 2 个版本