Towards generalisable hate speech detection: a review on obstacles and solutions
Hate speech is one type of harmful online content which directly attacks or promotes hate
towards a group or an individual member based on their actual or perceived aspects of …
towards a group or an individual member based on their actual or perceived aspects of …
Directions in abusive language training data, a systematic review: Garbage in, garbage out
B Vidgen, L Derczynski - Plos one, 2020 - journals.plos.org
Data-driven and machine learning based approaches for detecting, categorising and
measuring abusive content such as hate speech and harassment have gained traction due …
measuring abusive content such as hate speech and harassment have gained traction due …
A holistic approach to undesired content detection in the real world
We present a holistic approach to building a robust and useful natural language
classification system for real-world content moderation. The success of such a system relies …
classification system for real-world content moderation. The success of such a system relies …
Dynabench: Rethinking benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
Process for adapting language models to society (palms) with values-targeted datasets
I Solaiman, C Dennison - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract Language models can generate harmful and biased outputs and exhibit
undesirable behavior according to a given cultural context. We propose a Process for …
undesirable behavior according to a given cultural context. We propose a Process for …
HateCheck: Functional tests for hate speech detection models
Detecting online hate is a difficult task that even state-of-the-art models struggle with.
Typically, hate speech detection models are evaluated by measuring their performance on …
Typically, hate speech detection models are evaluated by measuring their performance on …
Learning from the worst: Dynamically generated datasets to improve online hate detection
We present a human-and-model-in-the-loop process for dynamically generating datasets
and training better performing and more robust hate detection models. We provide a new …
and training better performing and more robust hate detection models. We provide a new …
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …
A deep neural network based multi-task learning approach to hate speech detection
With the advent of the internet and numerous social media platforms, citizens now have
enormous opportunities to express and share their opinions on various societal and political …
enormous opportunities to express and share their opinions on various societal and political …
Offensive language and hate speech detection for Danish
GI Sigurbergsson, L Derczynski - arXiv preprint arXiv:1908.04531, 2019 - arxiv.org
The presence of offensive language on social media platforms and the implications this
poses is becoming a major concern in modern society. Given the enormous amount of …
poses is becoming a major concern in modern society. Given the enormous amount of …