Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
A review on language models as knowledge bases
Recently, there has been a surge of interest in the NLP community on the use of pretrained
Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs …
Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs …
A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …
capabilities with increasing scale. Despite their potentially transformative impact, these new …
Larger language models do in-context learning differently
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …
Mass-editing memory in a transformer
Recent work has shown exciting promise in updating large language models with new
memories, so as to replace obsolete information or add specialized knowledge. However …
memories, so as to replace obsolete information or add specialized knowledge. However …
Towards automated circuit discovery for mechanistic interpretability
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
Locating and editing factual associations in GPT
We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …
language models, finding evidence that these associations correspond to localized, directly …
Explainability for large language models: A survey
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …
language processing. However, their internal mechanisms are still unclear and this lack of …
Editing large language models: Problems, methods, and opportunities
Despite the ability to train capable LLMs, the methodology for maintaining their relevancy
and rectifying errors remains elusive. To this end, the past few years have witnessed a surge …
and rectifying errors remains elusive. To this end, the past few years have witnessed a surge …