The Saudi novel corpus: Design and compilation

T Alfraidi, MAR Abdeen, A Yatimi, R Alluhaibi… - Applied Sciences, 2022 - mdpi.com
Arabic has recently received significant attention from corpus compilers. This situation has
led to the creation of many Arabic corpora that cover various genres, most notably the …

Hierarchical aggregation of dialectal data for Arabic dialect identification

N Baimukan, H Bouamor, N Habash - Proceedings of the …, 2022 - aclanthology.org
Arabic is a collection of dialectal variants that are historically related but significantly
different. These differences can be seen across regions, countries, and even cities in the …

An incremental approach to corpus design and construction: application to a large contemporary saudi corpus

H Elgibreen, M Faisal, M Al Sulaiman, S Abdou… - IEEE …, 2021 - ieeexplore.ieee.org
Due to the rapid developments in technology and the sudden expansion of social media
use, Dialect Arabic has become an important source of data that needs to be addressed …

Is Arabic punctuation rule-governed?

S Yagi, S Fareh, A Elnagar, M Balajeed… - Cogent Arts & …, 2024 - Taylor & Francis
This paper investigates the extent to which Arabic punctuation is rule-governed, with the aim
of improving text comprehension, disambiguation, and machine translation. The study …

Maknuune: A Large Open Palestinian Arabic Lexicon

S Dibas, C Khairallah, N Habash, OF Sadi… - arXiv preprint arXiv …, 2022 - arxiv.org
We present Maknuune, a large open lexicon for the Palestinian Arabic dialect. Maknuune
has over 36K entries from 17K lemmas, and 3.7 K roots. All entries include diacritized Arabic …

[HTML][HTML] Morphologically-analyzed and syntactically-annotated Quran dataset

M Sawalha, F Al-Shargi, S Yagi, AT AlShdaifat… - Data in Brief, 2025 - Elsevier
This paper introduces the Morphologically-Analyzed and Syntactically-Annotated Quran
(MASAQ) dataset, a comprehensive resource designed to address the scarcity of annotated …

Towards Gulf Emirati Dialect Corpus from Social Media

BA AlAzzam, M Alkhatib, K Shaalan - BUiD Doctoral Research …, 2024 - Springer
Purpose: This paper discusses the need for a corpus of Emirati traditional phrases and
idioms in natural language processing (NLP) for the Gulf Emirati dialect and its potential …

Aggregating Hierarchical Dialectal Data for Arabic Dialect Classification

N Baimukan, H Bouamor, N Habash - kilthub.cmu.edu
Arabic is a collection of dialectal variants that are historically related but significantly
different. These differences can be seen across regions, countries, and even cities in the …