Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org
Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

When llms meet cybersecurity: A systematic literature review

J Zhang, H Bu, H Wen, Y Chen, L Li, H Zhu - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancements in large language models (LLMs) have opened new avenues
across various fields, including cybersecurity, which faces an ever-evolving threat landscape …

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

J Parmar, S Satheesh, M Patwary, M Shoeybi… - arXiv preprint arXiv …, 2024 - arxiv.org
As language models have scaled both their number of parameters and pretraining dataset
sizes, the computational cost for pretraining has become intractable except for the most well …

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

A Practitioner's Guide to Continual Multimodal Pretraining

K Roth, V Udandarao, S Dziadzio, A Prabhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal foundation models serve numerous applications at the intersection of vision and
language. Still, despite being pretrained on extensive data, they become outdated over time …

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models

Y Yang, X Li, Z Zhou, SL Song, J Wu, L Nie… - arXiv preprint arXiv …, 2024 - arxiv.org
Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering
the context of downstream task to learn, or the context of important knowledge to maintain …

Towards Effective and Efficient Continual Pre-training of Large Language Models

J Chen, Z Chen, J Wang, K Zhou, Y Zhu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Continual pre-training (CPT) has been an important approach for adapting language models
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

A Alexandrov, V Raychev, MN Müller, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
As open-weight large language models (LLMs) achieve ever more impressive performances
across a wide range of tasks in English, practitioners aim to adapt these models to different …

How Susceptible are LLMs to Influence in Prompts?

S Anagnostidis, J Bulian - arXiv preprint arXiv:2408.11865, 2024 - arxiv.org
Large Language Models (LLMs) are highly sensitive to prompts, including additional context
provided therein. As LLMs grow in capability, understanding their prompt-sensitivity …

Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations

X Jin, X Ren - arXiv preprint arXiv:2406.14026, 2024 - arxiv.org
Language models (LMs) are known to suffer from forgetting of previously learned examples
when fine-tuned, breaking stability of deployed LM systems. Despite efforts on mitigating …