Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Attention in natural language processing

A Galassi, M Lippi, P Torroni - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
Attention is an increasingly popular mechanism used in a wide range of neural
architectures. The mechanism itself has been realized in a variety of formats. However …

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

An attentive survey of attention models

S Chaudhari, V Mithal, G Polatkan… - ACM Transactions on …, 2021 - dl.acm.org
Attention Model has now become an important concept in neural networks that has been
researched within diverse application domains. This survey provides a structured and …

A multiscale visualization of attention in the transformer model

J Vig - arXiv preprint arXiv:1906.05714, 2019 - arxiv.org
The Transformer is a sequence model that forgoes traditional recurrent architectures in favor
of a fully attention-based approach. Besides improving performance, an advantage of using …

On the explainability of natural language processing deep models

JE Zini, M Awad - ACM Computing Surveys, 2022 - dl.acm.org
Despite their success, deep networks are used as black-box models with outputs that are not
easily explainable during the learning and the prediction phases. This lack of interpretability …

Analyzing the structure of attention in a transformer language model

J Vig, Y Belinkov - arXiv preprint arXiv:1906.04284, 2019 - arxiv.org
The Transformer is a fully attention-based alternative to recurrent networks that has
achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the …

Attentionviz: A global view of transformer attention

C Yeh, Y Chen, A Wu, C Chen, F Viégas… - … on Visualization and …, 2023 - ieeexplore.ieee.org
Transformer models are revolutionizing machine learning, but their inner workings remain
mysterious. In this work, we present a new visualization technique designed to help …

AllenNLP interpret: A framework for explaining predictions of NLP models

E Wallace, J Tuyls, J Wang, S Subramanian… - arXiv preprint arXiv …, 2019 - arxiv.org
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in
counterintuitive ways and leave end users puzzled at their behavior. Model interpretation …

Why attention is not explanation: Surgical intervention and causal reasoning about neural models

C Grimsley, E Mayfield, J Bursten - 2020 - philpapers.org
As the demand for explainable deep learning grows in the evaluation of language
technologies, the value of a principled grounding for those explanations grows as well. Here …