Symbols and grounding in large language models
E Pavlick - … Transactions of the Royal Society A, 2023 - royalsocietypublishing.org
Large language models (LLMs) are one of the most impressive achievements of artificial
intelligence in recent years. However, their relevance to the study of language more broadly …
intelligence in recent years. However, their relevance to the study of language more broadly …
A survey on fairness in large language models
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …
prospect and are widely deployed in the real world. However, LLMs can capture social …
Towards understanding and mitigating social biases in language models
As machine learning methods are deployed in real-world settings such as healthcare, legal
systems, and social science, it is crucial to recognize how they shape social biases and …
systems, and social science, it is crucial to recognize how they shape social biases and …
Auto-debias: Debiasing masked language models with automated biased prompts
Human-like biases and undesired social stereotypes exist in large pretrained language
models. Given the wide adoption of these models in real-world applications, mitigating such …
models. Given the wide adoption of these models in real-world applications, mitigating such …
Leace: Perfect linear concept erasure in closed form
N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …
improve fairness (eg preventing a classifier from using gender or race) and interpretability …
Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI
Trust is a central component of the interaction between people and AI, in that'incorrect'levels
of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the …
of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the …
Causal inference in natural language processing: Estimation, prediction, interpretation and beyond
A fundamental goal of scientific research is to learn about causal relationships. However,
despite its critical role in the life and social sciences, causality has not had the same …
despite its critical role in the life and social sciences, causality has not had the same …
Language (technology) is power: A critical survey of" bias" in nlp
We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …
Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp
Abstract⚠ This paper contains prompts and model outputs that are offensive in nature. When
trained on large, unfiltered crawls from the Internet, language models pick up and reproduce …
trained on large, unfiltered crawls from the Internet, language models pick up and reproduce …
Interpretability at scale: Identifying causal mechanisms in alpaca
Obtaining human-interpretable explanations of large, general-purpose language models is
an urgent goal for AI safety. However, it is just as important that our interpretability methods …
an urgent goal for AI safety. However, it is just as important that our interpretability methods …