Dissecting large language models

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Dissecting large language models

在引用文章中搜索

[PDF] arxiv.org

Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

N Pochinkov, B Pasero, S Shibayama - arXiv preprint arXiv:2408.17322, 2024 - arxiv.org

The use of transformer-based models is growing rapidly throughout society. With this growth,
it is important to understand how they work, and in particular, how the attention mechanisms …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

A Blanco-Justicia, N Jebreel, B Manzanares… - arXiv preprint arXiv …, 2024 - arxiv.org

The objective of digital forgetting is, given a model with undesirable knowledge or behavior,
obtain a new model where the detected issues are no longer present. The motivations for …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Extending Activation Steering to Broad Skills and Multiple Behaviours

T van der Weij, M Poesio, N Schoots - arXiv preprint arXiv:2403.05767, 2024 - arxiv.org

Current large language models have dangerous capabilities, which are likely to become
more problematic in the future. Activation steering techniques can be used to reduce risks …

被引用次数：2 相关文章所有 2 个版本

[PDF] researchsquare.com

Nexus Scissor: Enhance Open-Access Language Model Safety by Connection Pruning

Y Pang, P Mai, Y Yang, R Yan - 2024 - researchsquare.com

Large language models (LLMs) are vulnerable to adversarial attacks that bypass safety
measures and induce the model to generate harmful content. Securing open-access LLMs …