Explainability for large language models: A survey
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …
language processing. However, their internal mechanisms are still unclear and this lack of …
Guided integrated gradients: An adaptive path method for removing noise
A Kapishnikov, S Venugopalan… - Proceedings of the …, 2021 - openaccess.thecvf.com
Integrated Gradients (IG) is a commonly used feature attribution method for deep neural
networks. While IG has many desirable properties, the method often produces …
networks. While IG has many desirable properties, the method often produces …
Discretized integrated gradients for explaining language models
As a prominent attribution-based explanation algorithm, Integrated Gradients (IG) is widely
adopted due to its desirable explanation axioms and the ease of gradient computation. It …
adopted due to its desirable explanation axioms and the ease of gradient computation. It …
Gradient based Feature Attribution in Explainable AI: A Technical Review
The surge in black-box AI models has prompted the need to explain the internal mechanism
and justify their reliability, especially in high-stakes applications, such as healthcare and …
and justify their reliability, especially in high-stakes applications, such as healthcare and …
Focus! rating xai methods and finding biases
AI explainability improves the transparency and trustworthiness of models. However, in the
domain of images, where deep learning has succeeded the most, explainability is still poorly …
domain of images, where deep learning has succeeded the most, explainability is still poorly …
[PDF][PDF] Counterfactual Interpolation Augmentation (CIA): A Unified Approach to Enhance Fairness and Explainability of DNN.
Bias in the training data can jeopardize fairness and explainability of deep neural network
prediction on test data. We propose a novel bias-tailored data augmentation approach …
prediction on test data. We propose a novel bias-tailored data augmentation approach …
Shaping noise for robust attributions in neural stochastic differential equations
SK Jha, R Ewetz, A Velasquez… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract Neural SDEs with Brownian motion as noise lead to smoother attributions than
traditional ResNets. Various attribution methods such as saliency maps, integrated …
traditional ResNets. Various attribution methods such as saliency maps, integrated …
Pixelated High-Q Metasurfaces for in Situ Biospectroscopy and Artificial Intelligence-Enabled Classification of Lipid Membrane Photoswitching Dynamics
M Barkey, R Büchner, A Wester, SD Pritzl… - ACS …, 2024 - ACS Publications
Nanophotonic devices excel at confining light into intense hot spots of electromagnetic near
fields, creating exceptional opportunities for light–matter coupling and surface-enhanced …
fields, creating exceptional opportunities for light–matter coupling and surface-enhanced …
Negative flux aggregation to estimate feature attributions
There are increasing demands for understanding deep neural networks'(DNNs) behavior
spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of …
spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of …
Xplain: Analyzing Invisible Correlations in Model Explanation
K Kumari, A Pegoraro, H Fereidooni… - 33rd USENIX Security …, 2024 - usenix.org
Explanation methods analyze the features in backdoored input data that contribute to model
misclassification. However, current methods like path techniques struggle to detect backdoor …
misclassification. However, current methods like path techniques struggle to detect backdoor …