Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Attention in natural language processing
Attention is an increasingly popular mechanism used in a wide range of neural
architectures. The mechanism itself has been realized in a variety of formats. However …
architectures. The mechanism itself has been realized in a variety of formats. However …
Toward transparent ai: A survey on interpreting the inner structures of deep neural networks
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
An attentive survey of attention models
Attention Model has now become an important concept in neural networks that has been
researched within diverse application domains. This survey provides a structured and …
researched within diverse application domains. This survey provides a structured and …
A multiscale visualization of attention in the transformer model
J Vig - arXiv preprint arXiv:1906.05714, 2019 - arxiv.org
The Transformer is a sequence model that forgoes traditional recurrent architectures in favor
of a fully attention-based approach. Besides improving performance, an advantage of using …
of a fully attention-based approach. Besides improving performance, an advantage of using …
On the explainability of natural language processing deep models
JE Zini, M Awad - ACM Computing Surveys, 2022 - dl.acm.org
Despite their success, deep networks are used as black-box models with outputs that are not
easily explainable during the learning and the prediction phases. This lack of interpretability …
easily explainable during the learning and the prediction phases. This lack of interpretability …
Analyzing the structure of attention in a transformer language model
J Vig, Y Belinkov - arXiv preprint arXiv:1906.04284, 2019 - arxiv.org
The Transformer is a fully attention-based alternative to recurrent networks that has
achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the …
achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the …
Attentionviz: A global view of transformer attention
Transformer models are revolutionizing machine learning, but their inner workings remain
mysterious. In this work, we present a new visualization technique designed to help …
mysterious. In this work, we present a new visualization technique designed to help …
AllenNLP interpret: A framework for explaining predictions of NLP models
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in
counterintuitive ways and leave end users puzzled at their behavior. Model interpretation …
counterintuitive ways and leave end users puzzled at their behavior. Model interpretation …
Why attention is not explanation: Surgical intervention and causal reasoning about neural models
C Grimsley, E Mayfield, J Bursten - 2020 - philpapers.org
As the demand for explainable deep learning grows in the evaluation of language
technologies, the value of a principled grounding for those explanations grows as well. Here …
technologies, the value of a principled grounding for those explanations grows as well. Here …