Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Discriminating between similar languages and arabic dialect identification: A report on the third dsl shared task

S Malmasi, M Zampieri, N Ljubešić… - Proceedings of the …, 2016 - aclanthology.org
We present the results of the third edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the VarDial'2016 workshop at …

A systematic study of knowledge graph analysis for cross-language plagiarism detection

M Franco-Salvador, P Rosso… - Information Processing & …, 2016 - Elsevier
Cross-language plagiarism detection aims to detect plagiarised fragments of text among
documents in different languages. In this paper, we perform a systematic examination of …

[PDF][PDF] Overview of the DSL shared task 2015

M Zampieri, L Tan, N Ljubešić… - Proceedings of the …, 2015 - aclanthology.org
We present the results of the 2nd edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the LT4VarDial'2015 workshop and …

Discriminating similar languages: Evaluations and explorations

C Goutte, S Léger, S Malmasi, M Zampieri - arXiv preprint arXiv …, 2016 - arxiv.org
We present an analysis of the performance of machine learning classifiers on discriminating
between similar languages and language varieties. We carried out a number of experiments …

Application of the distributed document representation in the authorship attribution task for small corpora

JP Posadas-Durán, H Gómez-Adorno, G Sidorov… - Soft Computing, 2017 - Springer
Distributed word representation in a vector space (word embeddings) is a novel technique
that allows to represent words in terms of the elements in the neighborhood. Distributed …

Uh-prhlt at semeval-2016 task 3: Combining lexical and semantic-based features for community question answering

M Franco-Salvador, S Kar, T Solorio… - arXiv preprint arXiv …, 2018 - arxiv.org
In this work we describe the system built for the three English subtasks of the SemEval 2016
Task 3 by the Department of Computer Science of the University of Houston (UH) and the …

When sparse traditional models outperform dense neural networks: the curious case of discriminating between similar languages

M Medvedeva, M Kroon, B Plank - … of the Fourth Workshop on NLP …, 2017 - aclanthology.org
We present the results of our participation in the VarDial 4 shared task on discriminating
closely related languages. Our submission includes simple traditional models using linear …

A character-level convolutional neural network for distinguishing similar languages and dialects

Y Belinkov, J Glass - arXiv preprint arXiv:1609.07568, 2016 - arxiv.org
Discriminating between closely-related language varieties is considered a challenging and
important task. This paper describes our submission to the DSL 2016 shared-task, which …

Discriminating similar languages with linear SVMs and neural networks

Ç Çöltekin, T Rama - Proceedings of the Third Workshop on NLP …, 2016 - aclanthology.org
This paper describes the systems we experimented with for participating in the
discriminating between similar languages (DSL) shared task 2016. We submitted results of a …