Adversarial removal of demographic attributes from text data

Y Li, M Du, R Song, X Wang, Y Wang - arXiv preprint arXiv:2308.10149, 2023 - arxiv.org

Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …

被引用次数：45 相关文章所有 2 个版本

[PDF] aclanthology.org

Anonymisation models for text data: State of the art, challenges and future directions

P Lison, I Pilán, D Sánchez, M Batet… - Proceedings of the 59th …, 2021 - aclanthology.org

This position paper investigates the problem of automated text anonymisation, which is a
prerequisite for secure sharing of documents containing sensitive information about …

被引用次数：77 相关文章所有 4 个版本

[PDF] aclanthology.org

Auto-debias: Debiasing masked language models with automated biased prompts

Y Guo, Y Yang, A Abbasi - … of the 60th Annual Meeting of the …, 2022 - aclanthology.org

Human-like biases and undesired social stereotypes exist in large pretrained language
models. Given the wide adoption of these models in real-world applications, mitigating such …

被引用次数：134 相关文章所有 3 个版本

[PDF] neurips.cc

Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

被引用次数：58 相关文章所有 5 个版本

[PDF] acm.org Full View

Winogrande: An adversarial winograd schema challenge at scale

K Sakaguchi, RL Bras, C Bhagavatula… - Communications of the …, 2021 - dl.acm.org

Commonsense reasoning remains a major challenge in AI, and yet, recent progresses on
benchmarks may seem to suggest otherwise. In particular, the recent neural language …

被引用次数：984 相关文章所有 17 个版本

[PDF] arxiv.org

Plug and play language models: A simple approach to controlled text generation

S Dathathri, A Madotto, J Lan, J Hung, E Frank… - arXiv preprint arXiv …, 2019 - arxiv.org

Large transformer-based language models (LMs) trained on huge text corpora have shown
unparalleled generation capabilities. However, controlling attributes of the generated …

被引用次数：868 相关文章所有 8 个版本

[PDF] arxiv.org

Think locally, act globally: Federated learning with local and global representations

PP Liang, T Liu, L Ziyin, NB Allen, RP Auerbach… - arXiv preprint arXiv …, 2020 - arxiv.org

Federated learning is a method of training models on private data distributed over multiple
devices. To keep device data private, the global model is trained by only communicating …

被引用次数：528 相关文章所有 2 个版本

[PDF] neurips.cc

Differential privacy has disparate impact on model accuracy

E Bagdasaryan, O Poursaeed… - Advances in neural …, 2019 - proceedings.neurips.cc

Differential privacy (DP) is a popular mechanism for training machine learning models with
bounded leakage about the presence of specific points in the training data. The cost of …

被引用次数：501 相关文章所有 13 个版本

[PDF] arxiv.org

Null it out: Guarding protected attributes by iterative nullspace projection

S Ravfogel, Y Elazar, H Gonen, M Twiton… - arXiv preprint arXiv …, 2020 - arxiv.org

The ability to control for the kinds of information encoded in neural representation has a
variety of use cases, especially in light of the challenge of interpreting these models. We …

被引用次数：326 相关文章所有 5 个版本

[PDF] arxiv.org

Evaluating gender bias in machine translation

G Stanovsky, NA Smith, L Zettlemoyer - arXiv preprint arXiv:1906.00591, 2019 - arxiv.org

We present the first challenge set and evaluation protocol for the analysis of gender bias in
machine translation (MT). Our approach uses two recent coreference resolution datasets …

被引用次数：422 相关文章所有 5 个版本