Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
Findings of the VarDial evaluation campaign 2017
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …
Natural language processing for dialects of a language: A survey
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …
corpora, and report a superlative performance on evaluation datasets. This survey delves …
PALI: A language identification benchmark for perso-arabic scripts
The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various
linguistic communities around the globe. Identifying various languages using such scripts is …
linguistic communities around the globe. Identifying various languages using such scripts is …
[PDF][PDF] Attention-based cnn-bilstm for dialect identification on javanese text
AF Hidayatullah, S Cahyaningtyas… - Kinetik: Game …, 2020 - scholar.archive.org
This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM
(ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input …
(ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input …
Combining n-grams and deep convolutional features for language variety classification
This paper presents a novel neural architecture capable of outperforming state-of-the-art
systems on the task of language variety classification. The architecture is a hybrid that …
systems on the task of language variety classification. The architecture is a hybrid that …
A methodology to measure the diachronic language distance between three languages based on perplexity
The aim of this paper is to apply a corpus-based methodology, based on the measure of
perplexity, to automatically calculate the cross-lingual language distance between historical …
perplexity, to automatically calculate the cross-lingual language distance between historical …
[PDF][PDF] Language identification in texts
T Jauhiainen - 2019 - helda.helsinki.fi
This work investigates the task of identifying the language of digitally encoded text.
Automatic methods for language identification have been developed since the 1960s …
Automatic methods for language identification have been developed since the 1960s …
Closely related Indonesian language identification using deep learning
AF Hidayatullah, WD Amirullah… - AIP Conference …, 2023 - pubs.aip.org
Twitter is a well-known social media platform with over 500 million users worldwide. More
than 100 languages were identified from among a million tweets. However, only 34 formal …
than 100 languages were identified from among a million tweets. However, only 34 formal …
Using social networks to improve language variety identification with neural networks
Y Miura, T Taniguchi, M Taniguchi… - Proceedings of the …, 2017 - aclanthology.org
We propose a hierarchical neural network model for language variety identification that
integrates information from a social network. Recently, language variety identification has …
integrates information from a social network. Recently, language variety identification has …