Simple and scalable strategies to continually pre-train large language models

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org

Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

When llms meet cybersecurity: A systematic literature review

J Zhang, H Bu, H Wen, Y Chen, L Li, H Zhu - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancements in large language models (LLMs) have opened new avenues
across various fields, including cybersecurity, which faces an ever-evolving threat landscape …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

J Parmar, S Satheesh, M Patwary, M Shoeybi… - arXiv preprint arXiv …, 2024 - arxiv.org

As language models have scaled both their number of parameters and pretraining dataset
sizes, the computational cost for pretraining has become intractable except for the most well …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A Practitioner's Guide to Continual Multimodal Pretraining

K Roth, V Udandarao, S Dziadzio, A Prabhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …

[PDF] arxiv.org

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models

Y Yang, X Li, Z Zhou, SL Song, J Wu, L Nie… - arXiv preprint arXiv …, 2024 - arxiv.org

Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering
the context of downstream task to learn, or the context of important knowledge to maintain …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org