Label-free concept bottleneck models

T Oikarinen, S Das, LM Nguyen, TW Weng - arXiv preprint arXiv …, 2023 - arxiv.org
Concept bottleneck models (CBM) are a popular way of creating more interpretable neural
networks by having hidden layer neurons correspond to human-understandable concepts …

Text-to-concept (and back) via cross-model alignment

M Moayeri, K Rezaei, M Sanjabi… - … on Machine Learning, 2023 - proceedings.mlr.press
We observe that the mapping between an image's representation in one model to its
representation in another can be learned surprisingly well with just a linear layer, even …

Labeling neural representations with inverse recognition

K Bykov, L Kopf, S Nakajima… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Deep Neural Networks (DNNs) demonstrated remarkable capabilities in learning
complex hierarchical data representations, but the nature of these representations remains …

Find: A function description benchmark for evaluating interpretability methods

S Schwettmann, T Shaham… - Advances in …, 2024 - proceedings.neurips.cc
Labeling neural network submodules with human-legible descriptions is useful for many
downstream tasks: such descriptions can surface failures, guide interventions, and perhaps …

Faithful vision-language interpretation via concept bottleneck models

S Lai, L Hu, J Wang, L Berti-Equille… - The Twelfth International …, 2023 - openreview.net
The demand for transparency in healthcare and finance has led to interpretable machine
learning (IML) models, notably the concept bottleneck models (CBMs), valued for their …

Towards a fuller understanding of neurons with clustered compositional explanations

B La Rosa, L Gilpin… - Advances in Neural …, 2023 - proceedings.neurips.cc
Compositional Explanations is a method for identifying logical formulas of concepts that
approximate the neurons' behavior. However, these explanations are linked to the small …

Concept-based explainable artificial intelligence: A survey

E Poeta, G Ciravegna, E Pastor, T Cerquitelli… - arXiv preprint arXiv …, 2023 - arxiv.org
The field of explainable artificial intelligence emerged in response to the growing need for
more transparent and reliable models. However, using raw features to provide explanations …

Information maximization perspective of orthogonal matching pursuit with applications to explainable ai

A Chattopadhyay, R Pilgrim… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Information Pursuit (IP) is a classical active testing algorithm for predicting an output
by sequentially and greedily querying the input in order of information gain. However, IP is …

Decomposing and editing predictions by modeling model computation

H Shah, A Ilyas, A Madry - arXiv preprint arXiv:2404.11534, 2024 - arxiv.org
How does the internal computation of a machine learning model transform inputs into
predictions? In this paper, we introduce a task called component modeling that aims to …

Text2concept: Concept activation vectors directly from text

M Moayeri, K Rezaei, M Sanjabi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Concept activation vectors (CAVs) enable interpretability of a model with respect to
human concepts, though CAV generation requires the costly step of curating positive and …