[PDF][PDF] langid. py: An off-the-shelf language identification tool

M Lui, T Baldwin - Proceedings of the ACL 2012 system …, 2012 - aclanthology.org
We present langid. py, an off-the-shelf language identification tool. We discuss the design
and implementation of langid. py, and provide an empirical comparison on 5 longdocument …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

[图书][B] Natural language processing for social media

A Farzindar, D Inkpen, G Hirst - 2015 - Springer
In recent years, online social networking has revolutionized interpersonal communication.
The newer research on language analysis in social media has been increasingly focusing …

Estimating code-switching on twitter with a novel generalized word-level language detection technique

S Rijhwani, R Sequiera, M Choudhury… - Proceedings of the …, 2017 - aclanthology.org
Word-level language detection is necessary for analyzing code-switched text, where
multiple languages could be mixed within a sentence. Existing models are restricted to code …

[PDF][PDF] Accurate language identification of twitter messages

M Lui, T Baldwin - Proceedings of the 5th workshop on language …, 2014 - aclanthology.org
We present an evaluation of “off-theshelf” language identification systems as applied to
microblog messages from Twitter. A key challenge is the lack of an adequate corpus of …

[PDF][PDF] Language identification for creating language-specific twitter collections

S Bergsma, P McNamee, M Bagdouri… - Proceedings of the …, 2012 - aclanthology.org
Social media services such as Twitter offer an immense volume of real-world linguistic data.
We explore the use of Twitter to obtain authentic user-generated text in low-resource …

Recent developments in sentiment analysis on social networks: techniques, datasets, and open issues

A Saxena, H Reddy, P Saxena - Principles of Social Networking: The New …, 2022 - Springer
In recent years, sentiment analysis has been highly used on social media datasets to get
conclusive information, opinions of users about different topics, such as politics, events, and …

[PDF][PDF] Broadly improving user classification via communication-based name and location clustering on twitter

S Bergsma, M Dredze, B Van Durme… - Proceedings of the …, 2013 - aclanthology.org
Hidden properties of social media users, such as their ethnicity, gender, and location, are
often reflected in their observed attributes, such as their first and last names. Furthermore …

Language variety identification with true labels

M Zampieri, K North, T Jauhiainen, M Felice… - arXiv preprint arXiv …, 2023 - arxiv.org
Language identification is an important first step in many IR and NLP applications. Most
publicly available language identification datasets, however, are compiled under the …

The growing amplification of social media: Measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020

T Alshaabi, DR Dewhurst, JR Minot, MV Arnold… - EPJ data …, 2021 - epjds.epj.org
Working from a dataset of 118 billion messages running from the start of 2009 to the end of
2019, we identify and explore the relative daily use of over 150 languages on Twitter. We …