An empirical study of catastrophic forgetting in large language models during continual fine-tuning
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a
model forgets previously learned information while acquiring new knowledge. As large …
model forgets previously learned information while acquiring new knowledge. As large …
Orthogonal subspace learning for language model continual learning
Benefiting from massive corpora and advanced hardware, large language models (LLMs)
exhibit remarkable capabilities in language understanding and generation. However, their …
exhibit remarkable capabilities in language understanding and generation. However, their …
Sapt: A shared attention framework for parameter-efficient continual learning of large language models
W Zhao, S Wang, Y Hu, Y Zhao, B Qin… - Proceedings of the …, 2024 - aclanthology.org
The continual learning (CL) ability is vital for deploying large language models (LLMs) in the
dynamic world. Existing methods devise the learning module to acquire task-specific …
dynamic world. Existing methods devise the learning module to acquire task-specific …
Eight methods to evaluate robust unlearning in llms
Machine unlearning can be useful for removing harmful capabilities and memorized text
from large language models (LLMs), but there are not yet standardized methods for …
from large language models (LLMs), but there are not yet standardized methods for …
Defending Against Unforeseen Failure Modes with Latent Adversarial Training
AI systems sometimes exhibit harmful unintended behaviors post-deployment. This is often
despite extensive diagnostics and debugging by developers. Minimizing risks from models …
despite extensive diagnostics and debugging by developers. Minimizing risks from models …
Balancing speciality and versatility: a coarse to fine framework for supervised fine-tuning large language model
Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of
handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit …
handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit …
Localize-and-stitch: Efficient model merging via sparse task arithmetic
Model merging offers an effective strategy to combine the strengths of multiple finetuned
models into a unified model that preserves the specialized capabilities of each. Existing …
models into a unified model that preserves the specialized capabilities of each. Existing …
DAPT: A Dual Attention Framework for Parameter-Efficient Continual Learning of Large Language Models
The continual learning (CL) ability is vital for deploying large language models (LLMs) in the
dynamic world. Based on parameter-efficient tuning (PET), existing methods devise the …
dynamic world. Based on parameter-efficient tuning (PET), existing methods devise the …
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Large language models (LLMs) can often be made to behave in undesirable ways that they
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
Investigating Continual Pretraining in Large Language Models: Insights and Implications
This paper studies the evolving domain of Continual Learning (CL) in large language
models (LLMs), with a focus on developing strategies for efficient and sustainable training …
models (LLMs), with a focus on developing strategies for efficient and sustainable training …