A primer on the inner workings of transformer-based language models

J Ferrando, G Sarti, A Bisazza… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

Neuron-level knowledge attribution in large language models

Z Yu, S Ananiadou - Proceedings of the 2024 Conference on …, 2024 - aclanthology.org
Identifying important neurons for final predictions is essential for understanding the
mechanisms of large language models. Due to computational constraints, current attribution …

On logical extrapolation for mazes with recurrent and implicit networks

B Knutson, AC Rabeendran, M Ivanitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent work has suggested that certain neural network architectures-particularly recurrent
neural networks (RNNs) and implicit neural networks (INNs) are capable of logical …

Encourage or inhibit monosemanticity? revisit monosemanticity from a feature decorrelation perspective

H Yan, Y Xiang, G Chen, Y Wang, L Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
To better interpret the intrinsic mechanism of large language models (LLMs), recent studies
focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a …

Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models

C O'Neill, T Bui - arXiv preprint arXiv:2405.12522, 2024 - arxiv.org
This paper introduces an efficient and robust method for discovering interpretable circuits in
large language models using discrete sparse autoencoders. Our approach addresses key …

Sparse Prototype Network for Explainable Pedestrian Behavior Prediction

Y Feng, A Carballo, K Takeda - arXiv preprint arXiv:2410.12195, 2024 - arxiv.org
Predicting pedestrian behavior is challenging yet crucial for applications such as
autonomous driving and smart city. Recent deep learning models have achieved …

Local Sparse Representations: Connections With the Delaunay Triangulation and Dictionary Learning in Wasserstein Space

M Mueller - 2024 - search.proquest.com
We pursue local sparse representations of data by considering a common data model where
representations are formed as a combination of atoms that we call a dictionary. Our focus is …

[PDF][PDF] Neuron-Level Knowledge Attribution in Large Language Models

ZYS Ananiadou - … .porno.michellesellsvictoria.com
Identifying important neurons for final predictions is essential for understanding the
mechanisms of large language models. Due to computational constraints, current attribution …