Clip-dissect: Automatic description of neuron representations in deep vision networks

T Oikarinen, S Das, LM Nguyen, TW Weng - arXiv preprint arXiv …, 2023 - arxiv.org

Concept bottleneck models (CBM) are a popular way of creating more interpretable neural
networks by having hidden layer neurons correspond to human-understandable concepts …

被引用次数：113 相关文章所有 4 个版本

[PDF] mlr.press

Text-to-concept (and back) via cross-model alignment

M Moayeri, K Rezaei, M Sanjabi… - … on Machine Learning, 2023 - proceedings.mlr.press

We observe that the mapping between an image's representation in one model to its
representation in another can be learned surprisingly well with just a linear layer, even …

被引用次数：30 相关文章所有 8 个版本

[PDF] neurips.cc

Labeling neural representations with inverse recognition

K Bykov, L Kopf, S Nakajima… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Deep Neural Networks (DNNs) demonstrated remarkable capabilities in learning
complex hierarchical data representations, but the nature of these representations remains …

被引用次数：9 相关文章所有 9 个版本

[PDF] neurips.cc

Find: A function description benchmark for evaluating interpretability methods

S Schwettmann, T Shaham… - Advances in …, 2024 - proceedings.neurips.cc

Labeling neural network submodules with human-legible descriptions is useful for many
downstream tasks: such descriptions can surface failures, guide interventions, and perhaps …

被引用次数：10 相关文章所有 4 个版本

[PDF] openreview.net

Faithful vision-language interpretation via concept bottleneck models

S Lai, L Hu, J Wang, L Berti-Equille… - The Twelfth International …, 2023 - openreview.net

The demand for transparency in healthcare and finance has led to interpretable machine
learning (IML) models, notably the concept bottleneck models (CBMs), valued for their …

被引用次数：21 相关文章所有 2 个版本

[PDF] neurips.cc

Towards a fuller understanding of neurons with clustered compositional explanations

B La Rosa, L Gilpin… - Advances in Neural …, 2023 - proceedings.neurips.cc

Compositional Explanations is a method for identifying logical formulas of concepts that
approximate the neurons' behavior. However, these explanations are linked to the small …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Concept-based explainable artificial intelligence: A survey

E Poeta, G Ciravegna, E Pastor, T Cerquitelli… - arXiv preprint arXiv …, 2023 - arxiv.org

The field of explainable artificial intelligence emerged in response to the growing need for
more transparent and reliable models. However, using raw features to provide explanations …

被引用次数：20 相关文章所有 2 个版本

Information maximization perspective of orthogonal matching pursuit with applications to explainable ai

A Chattopadhyay, R Pilgrim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Information Pursuit (IP) is a classical active testing algorithm for predicting an output
by sequentially and greedily querying the input in order of information gain. However, IP is …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Decomposing and editing predictions by modeling model computation

H Shah, A Ilyas, A Madry - arXiv preprint arXiv:2404.11534, 2024 - arxiv.org

How does the internal computation of a machine learning model transform inputs into
predictions? In this paper, we introduce a task called component modeling that aims to …

被引用次数：2 相关文章所有 4 个版本

[PDF] thecvf.com

Text2concept: Concept activation vectors directly from text

M Moayeri, K Rezaei, M Sanjabi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Concept activation vectors (CAVs) enable interpretability of a model with respect to
human concepts, though CAV generation requires the costly step of curating positive and …

被引用次数：4 相关文章所有 3 个版本