Handling bias in toxic speech detection: A survey

T Garg, S Masud, T Suresh, T Chakraborty - ACM Computing Surveys, 2023 - dl.acm.org
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …

Is your toxicity my toxicity? exploring the impact of rater identity on toxicity annotation

N Goyal, ID Kivlichan, R Rosen… - Proceedings of the ACM …, 2022 - dl.acm.org
Machine learning models are commonly used to detect toxicity in online conversations.
These models are trained on datasets annotated by human raters. We explore how raters' …

Identifying and measuring annotator bias based on annotators' demographic characteristics

H Al Kuwatly, M Wich, G Groh - … of the fourth workshop on online …, 2020 - aclanthology.org
Abstract Machine learning is recently used to detect hate speech and other forms of abusive
language in online platforms. However, a notable weakness of machine learning models is …

Why don't you do it right? analysing annotators' disagreement in subjective tasks

M Sandri, E Leonardelli, S Tonelli… - Proceedings of the 17th …, 2023 - aclanthology.org
Annotators' disagreement in linguistic data has been recently the focus of multiple initiatives
aimed at raising awareness on issues related to 'majority voting'when aggregating diverging …

Cross-lingual few-shot hate speech and offensive language detection using meta learning

M Mozafari, R Farahbakhsh, N Crespi - IEEE Access, 2022 - ieeexplore.ieee.org
Automatic detection of abusive online content such as hate speech, offensive language,
threats, etc. has become prevalent in social media, with multiple efforts dedicated to …

Challenges in applying explainability methods to improve the fairness of NLP models

E Balkir, S Kiritchenko, I Nejadgholi… - arXiv preprint arXiv …, 2022 - arxiv.org
Motivations for methods in explainable artificial intelligence (XAI) often include detecting,
quantifying and mitigating bias, and contributing to making machine learning models fairer …

Un-compromised credibility: Social media based multi-class hate speech classification for text

KA Qureshi, M Sabih - IEEE Access, 2021 - ieeexplore.ieee.org
There is an enormous growth of social media which fully promotes freedom of expression
through its anonymity feature. Freedom of expression is a human right but hate speech …

Annotating online misogyny

P Zeinert, N Inie, L Derczynski - … of the 59th Annual Meeting of the …, 2021 - aclanthology.org
Online misogyny, a category of online abusive language, has serious and harmful social
consequences. Automatic detection of misogynistic language online, while imperative …

Explaining toxic text via knowledge enhanced text generation

R Sridhar, D Yang - Proceedings of the 2022 Conference of the …, 2022 - aclanthology.org
Warning: This paper contains content that is offensive and may be upsetting. Biased or toxic
speech can be harmful to various demographic groups. Therefore, it is not only important for …

Unmasking and improving data credibility: A study with datasets for training harmless language models

Z Zhu, J Wang, H Cheng, Y Liu - arXiv preprint arXiv:2311.11202, 2023 - arxiv.org
Language models have shown promise in various tasks but can be affected by undesired
data during training, fine-tuning, or alignment. For example, if some unsafe conversations …