Importance and Coherence: Methods for Evaluating Modularity in Neural Networks
As deep neural networks become more advanced and widely-used, it is important to
understand their inner workings. Toward this goal, modular interpretations are appealing
because they offer flexible levels of abstraction aside from standard architectural building
blocks (eg, neurons, channels, layers). In this paper, we consider the problem of assessing
how functionally interpretable a given partitioning of neurons is. We propose two proxies for
this: importance which reflects how crucial sets of neurons are to network performance, and …
understand their inner workings. Toward this goal, modular interpretations are appealing
because they offer flexible levels of abstraction aside from standard architectural building
blocks (eg, neurons, channels, layers). In this paper, we consider the problem of assessing
how functionally interpretable a given partitioning of neurons is. We propose two proxies for
this: importance which reflects how crucial sets of neurons are to network performance, and …
As deep neural networks become more advanced and widely-used, it is important to understand their inner workings. Toward this goal, modular interpretations are appealing because they offer flexible levels of abstraction aside from standard architectural building blocks (e.g., neurons, channels, layers). In this paper, we consider the problem of assessing how functionally interpretable a given partitioning of neurons is. We propose two proxies for this: importance which reflects how crucial sets of neurons are to network performance, and coherence which reflects how consistently their neurons associate with input/output features. To measure these proxies, we develop a set of statistical methods based on techniques that have conventionally been used for the interpretation of individual neurons. We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights. We show that despite our partitioning algorithm using neither activations nor gradients, it reveals clusters with a surprising amount of importance and coherence. Together, these results support the use of modular interpretations, and graph-based partitionings in particular, for interpretability.
openreview.net
以上显示的是最相近的搜索结果。 查看全部搜索结果