A parallel corpus for Vietnamese central-northern dialect text transfer

T Le, A Luu - Findings of the Association for Computational …, 2023 - aclanthology.org
The Vietnamese language embodies dialectal variants closely attached to the nation's three
macro-regions: the Northern, Central and Southern regions. As the northern dialect forms …

NLP for Counterspeech against Hate: A Survey and How-To Guide

H Bonaldi, YL Chung, G Abercrombie… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, counterspeech has emerged as one of the most promising strategies to fight
online hate. These non-escalatory responses tackle online abuse while preserving the …

NAIJAHATE: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

M Tonneau, PVQ de Castro, K Lasri, I Farouq… - arXiv preprint arXiv …, 2024 - arxiv.org
To address the global issue of online hate, hate speech detection (HSD) systems are
typically developed on datasets from the United States, thereby failing to generalize to …

“I Searched for a Religious Song in Amharic and Got Sexual Content Instead'': Investigating Online Harm in Low-Resourced Languages on YouTube.

HH Nigatu, ID Raji - The 2024 ACM Conference on Fairness …, 2024 - dl.acm.org
Online social media platforms such as YouTube have a wide, global reach. However, little is
known about the experience of low-resourced language speakers on such platforms; …

Evaluating Pixel Language Models on Non-Standardized Languages

A Muñoz-Ortiz, V Blaschke, B Plank - arXiv preprint arXiv:2412.09084, 2024 - arxiv.org
We explore the potential of pixel-based models for transfer learning from standard
languages to dialects. These models convert text into images that are divided into patches …

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

O Ahia, S Kumar, H Gonen, V Hoffman… - arXiv preprint arXiv …, 2024 - arxiv.org
In multilingual settings, non-Latin scripts and low-resource languages are usually
disadvantaged in terms of language models' utility, efficiency, and cost. Specifically …

Vicinal risk minimization for few-shot cross-lingual transfer in abusive language detection

G De la Peña Sarracén, P Rosso… - Proceedings of the …, 2023 - aclanthology.org
Cross-lingual transfer learning from high-resource to medium and low-resource languages
has shown encouraging results. However, the scarcity of resources in target languages …

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network

S Gul, MS Khan, A Ur-Rehman - Plos one, 2024 - journals.plos.org
Speech enhancement is crucial both for human and machine listening applications. Over the
last decade, the use of deep learning for speech enhancement has resulted in tremendous …

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

M Tonneau, D Liu, N Malhotra, SA Hale… - arXiv preprint arXiv …, 2024 - arxiv.org
To tackle the global challenge of online hate speech, a large body of research has
developed detection models to flag hate speech in the sea of online content. Yet, due to …

The# Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain

M Grandury - arXiv preprint arXiv:2407.17479, 2024 - arxiv.org
We are 600 million Spanish speakers. We launched the# Somos600M Project because the
diversity of the languages from LATAM, the Caribbean and Spain needs to be represented …