Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Guided integrated gradients: An adaptive path method for removing noise

A Kapishnikov, S Venugopalan… - Proceedings of the …, 2021 - openaccess.thecvf.com
Integrated Gradients (IG) is a commonly used feature attribution method for deep neural
networks. While IG has many desirable properties, the method often produces …

Discretized integrated gradients for explaining language models

S Sanyal, X Ren - arXiv preprint arXiv:2108.13654, 2021 - arxiv.org
As a prominent attribution-based explanation algorithm, Integrated Gradients (IG) is widely
adopted due to its desirable explanation axioms and the ease of gradient computation. It …

Gradient based Feature Attribution in Explainable AI: A Technical Review

Y Wang, T Zhang, X Guo, Z Shen - arXiv preprint arXiv:2403.10415, 2024 - arxiv.org
The surge in black-box AI models has prompted the need to explain the internal mechanism
and justify their reliability, especially in high-stakes applications, such as healthcare and …

Focus! rating xai methods and finding biases

A Arias-Duart, F Parés, D Garcia-Gasulla… - … on Fuzzy Systems …, 2022 - ieeexplore.ieee.org
AI explainability improves the transparency and trustworthiness of models. However, in the
domain of images, where deep learning has succeeded the most, explainability is still poorly …

[PDF][PDF] Counterfactual Interpolation Augmentation (CIA): A Unified Approach to Enhance Fairness and Explainability of DNN.

Y Qiang, C Li, M Brocanelli, D Zhu - IJCAI, 2022 - qiangyao1988.github.io
Bias in the training data can jeopardize fairness and explainability of deep neural network
prediction on test data. We propose a novel bias-tailored data augmentation approach …

Shaping noise for robust attributions in neural stochastic differential equations

SK Jha, R Ewetz, A Velasquez… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract Neural SDEs with Brownian motion as noise lead to smoother attributions than
traditional ResNets. Various attribution methods such as saliency maps, integrated …

Pixelated High-Q Metasurfaces for in Situ Biospectroscopy and Artificial Intelligence-Enabled Classification of Lipid Membrane Photoswitching Dynamics

M Barkey, R Büchner, A Wester, SD Pritzl… - ACS …, 2024 - ACS Publications
Nanophotonic devices excel at confining light into intense hot spots of electromagnetic near
fields, creating exceptional opportunities for light–matter coupling and surface-enhanced …

Negative flux aggregation to estimate feature attributions

X Li, D Pan, C Li, Y Qiang, D Zhu - arXiv preprint arXiv:2301.06989, 2023 - arxiv.org
There are increasing demands for understanding deep neural networks'(DNNs) behavior
spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of …

Xplain: Analyzing Invisible Correlations in Model Explanation

K Kumari, A Pegoraro, H Fereidooni… - 33rd USENIX Security …, 2024 - usenix.org
Explanation methods analyze the features in backdoored input data that contribute to model
misclassification. However, current methods like path techniques struggle to detect backdoor …