Impact of tokenization on language models: An analysis for turkish

C Toraman, EH Yilmaz, F Şahinuç… - ACM Transactions on …, 2023 - dl.acm.org
Tokenization is an important text preprocessing step to prepare input tokens for deep
language models. WordPiece and BPE are de facto methods employed by important …

KLPT–Kurdish language processing toolkit

S Ahmadi - Proceedings of second workshop for NLP open …, 2020 - aclanthology.org
Despite the recent advances in applying language-independent approaches to various
natural language processing tasks thanks to artificial intelligence, some language-specific …

Transfer learning for low-resource sentiment analysis

R Hameed, S Ahmadi, F Daneshfar - arXiv preprint arXiv:2304.04703, 2023 - arxiv.org
Sentiment analysis is the process of identifying and extracting subjective information from
text. Despite the advances to employ cross-lingual approaches in an automatic way, the …

A formal description of Sorani Kurdish morphology

S Ahmadi - arXiv preprint arXiv:2109.03942, 2021 - arxiv.org
Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due
to the patterns in which morphemes appear. Although several aspects of Kurdish …

Machine learning and the future of philology: A case study

B Graziosi, J Haubold, C Cowen-Breen, C Brooks - TAPA, 2023 - muse.jhu.edu
This paper argues that machine learning (ML) has a role to play in the future of philology,
understood here as a discipline concerned with preserving and elucidating the global …

Leveraging multilingual news websites for building a kurdish parallel corpus

S Ahmadi, H Hassani, DQ Jaff - … on Asian and Low-Resource Language …, 2022 - dl.acm.org
Machine translation has been a major motivation of development in natural language
processing. Despite the burgeoning achievements in creating more efficient machine …

A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging

D Maulud, K Jacksi, I Ali - Digital Scholarship in the Humanities, 2023 - academic.oup.com
With the rapid growth of online content written in the Kurdish language, there is an
increasing need to make it machine-readable and processable. Part of speech (POS) …

Hunspell for Sorani Kurdish spell checking and morphological analysis

S Ahmadi - arXiv preprint arXiv:2109.06374, 2021 - arxiv.org
Spell checking and morphological analysis are two fundamental tasks in text and natural
language processing and are addressed in the early stages of the development of language …

The effect of model capacity and script diversity on subword tokenization for Soranî Kurdish

A Salehi, CL Jacobs - … of the 21st SIGMORPHON workshop on …, 2024 - aclanthology.org
Tokenization and morphological segmentation continue to pose challenges for text
processing and studies of human language. Here, we focus on written Soranî Kurdish, which …

[HTML][HTML] A Hybrid Approach to Ontology Construction for the Badini Kurdish Language

M Azzat, K Jacksi, I Ali - Information, 2024 - mdpi.com
Semantic ontologies have been widely utilized as crucial tools within natural language
processing, underpinning applications such as knowledge extraction, question answering …