Discriminating between similar languages with word-level convolutional neural networks

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org

Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

被引用次数：253 相关文章所有 11 个版本

[PDF] aclanthology.org

Findings of the VarDial evaluation campaign 2017

M Zampieri, S Malmasi, N Ljubešić… - Proceedings of the …, 2017 - aclanthology.org

We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …

被引用次数：189 相关文章所有 14 个版本

[PDF] arxiv.org

Natural language processing for dialects of a language: A survey

A Joshi, R Dabre, D Kanojia, Z Li, H Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org

State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

PALI: A language identification benchmark for perso-arabic scripts

S Ahmadi, M Agarwal, A Anastasopoulos - arXiv preprint arXiv …, 2023 - arxiv.org

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various
linguistic communities around the globe. Identifying various languages using such scripts is …

被引用次数：6 相关文章所有 10 个版本

[PDF] archive.org

[PDF][PDF] Attention-based cnn-bilstm for dialect identification on javanese text

AF Hidayatullah, S Cahyaningtyas… - Kinetik: Game …, 2020 - scholar.archive.org

This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM
(ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input …

被引用次数：13 相关文章所有 2 个版本

[PDF] cambridge.org

Combining n-grams and deep convolutional features for language variety classification

M Martinc, S Pollak - Natural Language Engineering, 2019 - cambridge.org

This paper presents a novel neural architecture capable of outperforming state-of-the-art
systems on the task of language variety classification. The architecture is a hybrid that …

被引用次数：14 相关文章所有 6 个版本

[PDF] academia.edu

A methodology to measure the diachronic language distance between three languages based on perplexity

JR Pichel, P Gamallo, I Alegria… - Journal of Quantitative …, 2021 - Taylor & Francis

The aim of this paper is to apply a corpus-based methodology, based on the measure of
perplexity, to automatically calculate the cross-lingual language distance between historical …

被引用次数：7 相关文章所有 4 个版本

[PDF] helsinki.fi

[PDF][PDF] Language identification in texts

T Jauhiainen - 2019 - helda.helsinki.fi

This work investigates the task of identifying the language of digitally encoded text.
Automatic methods for language identification have been developed since the 1960s …

被引用次数：7 相关文章所有 4 个版本

Closely related Indonesian language identification using deep learning

AF Hidayatullah, WD Amirullah… - AIP Conference …, 2023 - pubs.aip.org

Twitter is a well-known social media platform with over 500 million users worldwide. More
than 100 languages were identified from among a million tweets. However, only 34 formal …

Using social networks to improve language variety identification with neural networks

Y Miura, T Taniguchi, M Taniguchi… - Proceedings of the …, 2017 - aclanthology.org

We propose a hierarchical neural network model for language variety identification that
integrates information from a social network. Recently, language variety identification has …

被引用次数：4 相关文章所有 3 个版本