A survey on fairness in large language models

Y Li, M Du, R Song, X Wang, Y Wang - arXiv preprint arXiv:2308.10149, 2023 - arxiv.org
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …

Anonymisation models for text data: State of the art, challenges and future directions

P Lison, I Pilán, D Sánchez, M Batet… - Proceedings of the 59th …, 2021 - aclanthology.org
This position paper investigates the problem of automated text anonymisation, which is a
prerequisite for secure sharing of documents containing sensitive information about …

Auto-debias: Debiasing masked language models with automated biased prompts

Y Guo, Y Yang, A Abbasi - … of the 60th Annual Meeting of the …, 2022 - aclanthology.org
Human-like biases and undesired social stereotypes exist in large pretrained language
models. Given the wide adoption of these models in real-world applications, mitigating such …

Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

Winogrande: An adversarial winograd schema challenge at scale

K Sakaguchi, RL Bras, C Bhagavatula… - Communications of the …, 2021 - dl.acm.org
Commonsense reasoning remains a major challenge in AI, and yet, recent progresses on
benchmarks may seem to suggest otherwise. In particular, the recent neural language …

Plug and play language models: A simple approach to controlled text generation

S Dathathri, A Madotto, J Lan, J Hung, E Frank… - arXiv preprint arXiv …, 2019 - arxiv.org
Large transformer-based language models (LMs) trained on huge text corpora have shown
unparalleled generation capabilities. However, controlling attributes of the generated …

Think locally, act globally: Federated learning with local and global representations

PP Liang, T Liu, L Ziyin, NB Allen, RP Auerbach… - arXiv preprint arXiv …, 2020 - arxiv.org
Federated learning is a method of training models on private data distributed over multiple
devices. To keep device data private, the global model is trained by only communicating …

Differential privacy has disparate impact on model accuracy

E Bagdasaryan, O Poursaeed… - Advances in neural …, 2019 - proceedings.neurips.cc
Differential privacy (DP) is a popular mechanism for training machine learning models with
bounded leakage about the presence of specific points in the training data. The cost of …

Null it out: Guarding protected attributes by iterative nullspace projection

S Ravfogel, Y Elazar, H Gonen, M Twiton… - arXiv preprint arXiv …, 2020 - arxiv.org
The ability to control for the kinds of information encoded in neural representation has a
variety of use cases, especially in light of the challenge of interpreting these models. We …

Evaluating gender bias in machine translation

G Stanovsky, NA Smith, L Zettlemoyer - arXiv preprint arXiv:1906.00591, 2019 - arxiv.org
We present the first challenge set and evaluation protocol for the analysis of gender bias in
machine translation (MT). Our approach uses two recent coreference resolution datasets …