Text information retrieval in Tetun

G de Jesus - European Conference on Information Retrieval, 2023 - Springer
Tetun is one of Timor-Leste's official languages alongside Portuguese. It is a low-resource
language with over 932,000 speakers that started developing when Timor-Leste restored its …

Compact Transformer-based Language Models for the Moroccan Darija

M Aghzal, MA El Bouni, S Driouech… - 2023 7th IEEE …, 2023 - ieeexplore.ieee.org
Over the past few years, pre-trained language models based on transformer architectures
revolutionized the field of natural language processing, achieving state-of-the-art …

[HTML][HTML] Moroccan Arabizi-to-Arabic conversion using rule-based transliteration and weighted Levenshtein algorithm

S Hajbi, O Amezian, N El Moukhi, R Korchiyne… - Scientific African, 2024 - Elsevier
The rise of social media has contributed to the widespread of the Arabizi writing form,
primarily used in colloquial communication. For Natural Language Processing (NLP) tools …

The Evolution of Darija Open Dataset: Introducing Version 2

A Outchakoucht, H Es-Samaali - arXiv preprint arXiv:2405.13016, 2024 - arxiv.org
Darija Open Dataset (DODa) represents an open-source project aimed at enhancing Natural
Language Processing capabilities for the Moroccan dialect, Darija. With approximately …

Text Information Retrieval in Tetun: A Preliminary Study

G de Jesus - arXiv preprint arXiv:2406.07331, 2024 - arxiv.org
Tetun is one of Timor-Leste's official languages alongside Portuguese. It is a low-resource
language with over 932,400 speakers that started developing when Timor-Leste restored its …

Ahmed and khalil at NADI 2022: Transfer learning and addressing class imbalance for Arabic dialect identification and sentiment analysis

A Oumar, K Mrini - Proceedings of the Seventh Arabic Natural …, 2022 - aclanthology.org
In this paper, we present our findings in the two subtasks of the 2022 NADI shared task. First,
in the Arabic dialect identification subtask, we find that there is heavy class imbalance, and …

Building a Corpus for the Underexplored Moroccan Dialect (CFMD) Through Audio Segmentations

H Zaidani, A Maizate, M Ouzzif… - Revue d'Intelligence …, 2024 - search.proquest.com
The advancement of artificial intelligence has deeply influenced numerous domains. One
particular area that has experienced remarkable progress is natural language processing …

CFMD: Corpus for Moroccan Dialect as Under Researched Dialect

H Zaidani, A Maizate, M Ouzzif, R Koulali - Future of Information and …, 2024 - Springer
The rise of social media has revolutionised numerous fields within artificial intelligence, and
one of the domains greatly impacted is natural language processing. With the widespread …

[PDF][PDF] Scientific African

A Hajji, M Rhachi - 2022 - academia.edu
abstract Anaerobic digestion is a promising process with many advantages on the
environmental and energy level, yet, it remains little exploited, due to the lack of control of …