Language embeddings sometimes contain typological generalizations

R Östling, M Kurfalı - Computational Linguistics, 2023 - direct.mit.edu
To what extent can neural network models learn generalizations about language structure,
and how do we find out what they have learned? We explore these questions by training …

The role of typological feature prediction in NLP and linguistics

J Bjerva - Computational Linguistics, 2023 - direct.mit.edu
Computational typology has gained traction in the field of Natural Language Processing
(NLP) in recent years, as evidenced by the increasing number of papers on the topic and the …

Linguistically Guided Multilingual NLP: Current Approaches, Challenges, and Future Perspectives

O Majewska, I Vulić, A Korhonen - Algebraic Structures in Natural …, 2022 - taylorfrancis.com
The neural revolution has redefined–and many would argue, undermined–the place of
traditional linguistics in natural language processing. The pace at which large unsupervised …

Colex2Lang: Language embeddings from semantic typology

Y Chen, R Biswas, J Bjerva - The 24th Nordic Conference on …, 2023 - vbn.aau.dk
In semantic typology, colexification refers to words with multiple meanings, either related
(polysemy) or unrelated (homophony). Studies of cross-linguistic colexification have yielded …

Using linguistic typology to enrich multilingual lexicons: the case of lexical gaps in kinship

T Khishigsuren, G Bella, K Batsuren, AA Freihat… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper describes a method to enrich lexical resources with content relating to linguistic
diversity, based on knowledge from the field of lexical typology. We capture the …

Does typological blinding impede cross-lingual sharing?

J Bjerva, I Augenstein - arXiv preprint arXiv:2101.11888, 2021 - arxiv.org
Bridging the performance gap between high-and low-resource languages has been the
focus of much previous work. Typological features from databases such as the World Atlas of …

Languages through the looking glass of bpe compression

X Gutierrez-Vasques, C Bentz… - Computational …, 2023 - direct.mit.edu
Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …

The past, present, and future of typological databases in NLP

E Baylor, E Ploeger, J Bjerva - arXiv preprint arXiv:2310.13440, 2023 - arxiv.org
Typological information has the potential to be beneficial in the development of NLP models,
particularly for low-resource languages. Unfortunately, current large-scale typological …

Colexifications for bootstrapping cross-lingual datasets: The case of phonology, concreteness, and affectiveness

Y Chen, J Bjerva - arXiv preprint arXiv:2306.02646, 2023 - arxiv.org
Colexification refers to the linguistic phenomenon where a single lexical form is used to
convey multiple meanings. By studying cross-lingual colexifications, researchers have …

Multilingual Gradient Word-Order Typology from Universal Dependencies

E Baylor, E Ploeger, J Bjerva - arXiv preprint arXiv:2402.01513, 2024 - arxiv.org
While information from the field of linguistic typology has the potential to improve
performance on NLP tasks, reliable typological data is a prerequisite. Existing typological …