Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
N Pochinkov, B Pasero, S Shibayama - arXiv preprint arXiv:2408.17322, 2024 - arxiv.org
The use of transformer-based models is growing rapidly throughout society. With this growth,
it is important to understand how they work, and in particular, how the attention mechanisms …
it is important to understand how they work, and in particular, how the attention mechanisms …
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
The objective of digital forgetting is, given a model with undesirable knowledge or behavior,
obtain a new model where the detected issues are no longer present. The motivations for …
obtain a new model where the detected issues are no longer present. The motivations for …
Extending Activation Steering to Broad Skills and Multiple Behaviours
Current large language models have dangerous capabilities, which are likely to become
more problematic in the future. Activation steering techniques can be used to reduce risks …
more problematic in the future. Activation steering techniques can be used to reduce risks …
Nexus Scissor: Enhance Open-Access Language Model Safety by Connection Pruning
Y Pang, P Mai, Y Yang, R Yan - 2024 - researchsquare.com
Large language models (LLMs) are vulnerable to adversarial attacks that bypass safety
measures and induce the model to generate harmful content. Securing open-access LLMs …
measures and induce the model to generate harmful content. Securing open-access LLMs …