Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

G Gledec, M Sokele, M Horvat, M Mikuc - Computers, 2024 - mdpi.com
This paper introduces a novel approach to the creation and application of confusion
matrices for error pattern discovery in spellchecking for the Croatian language. The …

[HTML][HTML] Word-length algorithm for language identification of under-resourced languages

A Selamat, N Akosu - Journal of King Saud University-Computer and …, 2016 - Elsevier
Abstract Language identification is widely used in machine learning, text mining, information
retrieval, and speech processing. Available techniques for solving the problem of language …

[PDF][PDF] Generalized language identification

MH Lui - 2014 - minerva-access.unimelb.edu.au
Abstract Language identification is the task of determining the natural language that a
document or part thereof is written in. The central theme of this thesis is generalized …

Improved text language identification for the South African languages

B Duvenhage, M Ntini… - 2017 Pattern Recognition …, 2017 - ieeexplore.ieee.org
Virtual assistants and text chatbots have recently been gaining popularity. Given the short
message nature of text-based chat interactions, the language identification systems of these …

Text-based language identification for some of the under-resourced languages of South Africa

TJ Sefara, MJ Manamela… - … Conference on Advances …, 2016 - ieeexplore.ieee.org
Language identification is the problem of correctly classifying a sample of text/documents
based on its language. However, much of the research work focused on the English …

Language identification with scarce data: A case study from peru

A Espichán-Linares, A Oncevay-Marcos - … 2017, Lima, Peru, September 4-6 …, 2018 - Springer
Abstract Language identification is an elemental task in natural language processing, where
corpus-based methods reign the state-of-the-art results in multi-lingual setups. However …

Word-Based Bantu Language Identification using Naïve Bayes

B Okgetheng, EAW Budu - 2022 IST-Africa Conference (IST …, 2022 - ieeexplore.ieee.org
Language identification of text has become increasingly important as large quantities of text
are processed or filtered automatically. It is one of the preprocessing steps in Natural …

[PDF][PDF] A low-resourced peruvian language identification model

AE Linares, A Oncevay-Marcos - CEUR Workshop Proceedings. CEUR …, 2017 - ceur-ws.org
Due to the linguistic revitalization in Perú through the last years, there is a growing interest to
reinforce the bilingual education in the country and to increase the research focused in its …

Event-based user profiling in social media using data mining approaches

T ARABGHALIZI, B RAHDARI - 2016 - politesi.polimi.it
Social Networks have undergone a dramatic growth and influenced everyone's life in recent
years. People share everything from daily life stories to the latest local and global news and …