Impact of tokenization on language models: An analysis for turkish
Tokenization is an important text preprocessing step to prepare input tokens for deep
language models. WordPiece and BPE are de facto methods employed by important …
language models. WordPiece and BPE are de facto methods employed by important …
KLPT–Kurdish language processing toolkit
S Ahmadi - Proceedings of second workshop for NLP open …, 2020 - aclanthology.org
Despite the recent advances in applying language-independent approaches to various
natural language processing tasks thanks to artificial intelligence, some language-specific …
natural language processing tasks thanks to artificial intelligence, some language-specific …
Transfer learning for low-resource sentiment analysis
Sentiment analysis is the process of identifying and extracting subjective information from
text. Despite the advances to employ cross-lingual approaches in an automatic way, the …
text. Despite the advances to employ cross-lingual approaches in an automatic way, the …
A formal description of Sorani Kurdish morphology
S Ahmadi - arXiv preprint arXiv:2109.03942, 2021 - arxiv.org
Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due
to the patterns in which morphemes appear. Although several aspects of Kurdish …
to the patterns in which morphemes appear. Although several aspects of Kurdish …
Machine learning and the future of philology: A case study
This paper argues that machine learning (ML) has a role to play in the future of philology,
understood here as a discipline concerned with preserving and elucidating the global …
understood here as a discipline concerned with preserving and elucidating the global …
Leveraging multilingual news websites for building a kurdish parallel corpus
Machine translation has been a major motivation of development in natural language
processing. Despite the burgeoning achievements in creating more efficient machine …
processing. Despite the burgeoning achievements in creating more efficient machine …
A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging
With the rapid growth of online content written in the Kurdish language, there is an
increasing need to make it machine-readable and processable. Part of speech (POS) …
increasing need to make it machine-readable and processable. Part of speech (POS) …
Hunspell for Sorani Kurdish spell checking and morphological analysis
S Ahmadi - arXiv preprint arXiv:2109.06374, 2021 - arxiv.org
Spell checking and morphological analysis are two fundamental tasks in text and natural
language processing and are addressed in the early stages of the development of language …
language processing and are addressed in the early stages of the development of language …
The effect of model capacity and script diversity on subword tokenization for Soranî Kurdish
Tokenization and morphological segmentation continue to pose challenges for text
processing and studies of human language. Here, we focus on written Soranî Kurdish, which …
processing and studies of human language. Here, we focus on written Soranî Kurdish, which …
[HTML][HTML] A Hybrid Approach to Ontology Construction for the Badini Kurdish Language
Semantic ontologies have been widely utilized as crucial tools within natural language
processing, underpinning applications such as knowledge extraction, question answering …
processing, underpinning applications such as knowledge extraction, question answering …