Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Findings of the VarDial evaluation campaign 2017

M Zampieri, S Malmasi, N Ljubešić… - Proceedings of the …, 2017 - aclanthology.org
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …

Natural language processing for dialects of a language: A survey

A Joshi, R Dabre, D Kanojia, Z Li, H Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …

PALI: A language identification benchmark for perso-arabic scripts

S Ahmadi, M Agarwal, A Anastasopoulos - arXiv preprint arXiv …, 2023 - arxiv.org
The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various
linguistic communities around the globe. Identifying various languages using such scripts is …

[PDF][PDF] Attention-based cnn-bilstm for dialect identification on javanese text

AF Hidayatullah, S Cahyaningtyas… - Kinetik: Game …, 2020 - scholar.archive.org
This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM
(ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input …

Combining n-grams and deep convolutional features for language variety classification

M Martinc, S Pollak - Natural Language Engineering, 2019 - cambridge.org
This paper presents a novel neural architecture capable of outperforming state-of-the-art
systems on the task of language variety classification. The architecture is a hybrid that …

A methodology to measure the diachronic language distance between three languages based on perplexity

JR Pichel, P Gamallo, I Alegria… - Journal of Quantitative …, 2021 - Taylor & Francis
The aim of this paper is to apply a corpus-based methodology, based on the measure of
perplexity, to automatically calculate the cross-lingual language distance between historical …

[PDF][PDF] Language identification in texts

T Jauhiainen - 2019 - helda.helsinki.fi
This work investigates the task of identifying the language of digitally encoded text.
Automatic methods for language identification have been developed since the 1960s …

Closely related Indonesian language identification using deep learning

AF Hidayatullah, WD Amirullah… - AIP Conference …, 2023 - pubs.aip.org
Twitter is a well-known social media platform with over 500 million users worldwide. More
than 100 languages were identified from among a million tweets. However, only 34 formal …

Using social networks to improve language variety identification with neural networks

Y Miura, T Taniguchi, M Taniguchi… - Proceedings of the …, 2017 - aclanthology.org
We propose a hierarchical neural network model for language variety identification that
integrates information from a social network. Recently, language variety identification has …