Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs

M Saeed, E Mohamed, M Mohamed, S Raza… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are widely used but raise ethical concerns due to
embedded social biases. This study examines LLM biases against Arabs versus Westerners …

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

PJ Lin, M Scholman, M Saeed, V Demberg - arXiv preprint arXiv …, 2024 - arxiv.org
Nigerian Pidgin is an English-derived contact language and is traditionally an oral
language, spoken by approximately 100 million people. No orthographic standard has yet …

Projecting Annotations for Discourse Relations: Connective Identification for Low-Resource Languages

P Bourgonje, PJ Lin - Proceedings of the 5th Workshop on …, 2024 - aclanthology.org
We present a pipeline for multi-lingual Shallow Discourse Parsing. The pipeline exploits
Machine Translation and Word Alignment, by translating any incoming non-English input …

From Nile Sands to Digital Hands: Machine Translation of Coptic Texts

M Saeed, A Mohamed, M Mohamed… - Proceedings of The …, 2024 - aclanthology.org
The Coptic language, rooted in the historical landscapes of Egypt, continues to serve as a
vital liturgical medium for the Coptic Orthodox and Catholic Churches across Egypt, North …