查看文章

AttnLRP: Attention-aware Layer-wise Relevance Propagation for Transformers

作者

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

发表日期

2024/2/8

期刊

arXiv preprint arXiv:2402.05602

简介

Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a singular backward pass. Through extensive evaluations against existing methods on Llama 2, Flan-T5 and the Vision Transformer architecture, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an open-source implementation on GitHub https://github.com/rachtibat/LRP-for-Transformers.

引用总数

被引用次数：3

20243

学术搜索中的文章

Attnlrp: attention-aware layer-wise relevance propagation for transformers

R Achtibat, SMV Hatefi, M Dreyer, A Jain, T Wiegand… - arXiv preprint arXiv:2402.05602, 2024

被引用次数：3 相关文章所有 3 个版本