Language embeddings sometimes contain typological generalizations
To what extent can neural network models learn generalizations about language structure,
and how do we find out what they have learned? We explore these questions by training …
and how do we find out what they have learned? We explore these questions by training …
The role of typological feature prediction in NLP and linguistics
J Bjerva - Computational Linguistics, 2023 - direct.mit.edu
Computational typology has gained traction in the field of Natural Language Processing
(NLP) in recent years, as evidenced by the increasing number of papers on the topic and the …
(NLP) in recent years, as evidenced by the increasing number of papers on the topic and the …
Linguistically Guided Multilingual NLP: Current Approaches, Challenges, and Future Perspectives
The neural revolution has redefined–and many would argue, undermined–the place of
traditional linguistics in natural language processing. The pace at which large unsupervised …
traditional linguistics in natural language processing. The pace at which large unsupervised …
Colex2Lang: Language embeddings from semantic typology
In semantic typology, colexification refers to words with multiple meanings, either related
(polysemy) or unrelated (homophony). Studies of cross-linguistic colexification have yielded …
(polysemy) or unrelated (homophony). Studies of cross-linguistic colexification have yielded …
Using linguistic typology to enrich multilingual lexicons: the case of lexical gaps in kinship
This paper describes a method to enrich lexical resources with content relating to linguistic
diversity, based on knowledge from the field of lexical typology. We capture the …
diversity, based on knowledge from the field of lexical typology. We capture the …
Does typological blinding impede cross-lingual sharing?
J Bjerva, I Augenstein - arXiv preprint arXiv:2101.11888, 2021 - arxiv.org
Bridging the performance gap between high-and low-resource languages has been the
focus of much previous work. Typological features from databases such as the World Atlas of …
focus of much previous work. Typological features from databases such as the World Atlas of …
Languages through the looking glass of bpe compression
X Gutierrez-Vasques, C Bentz… - Computational …, 2023 - direct.mit.edu
Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …
The past, present, and future of typological databases in NLP
Typological information has the potential to be beneficial in the development of NLP models,
particularly for low-resource languages. Unfortunately, current large-scale typological …
particularly for low-resource languages. Unfortunately, current large-scale typological …
Colexifications for bootstrapping cross-lingual datasets: The case of phonology, concreteness, and affectiveness
Colexification refers to the linguistic phenomenon where a single lexical form is used to
convey multiple meanings. By studying cross-lingual colexifications, researchers have …
convey multiple meanings. By studying cross-lingual colexifications, researchers have …
Multilingual Gradient Word-Order Typology from Universal Dependencies
While information from the field of linguistic typology has the potential to improve
performance on NLP tasks, reliable typological data is a prerequisite. Existing typological …
performance on NLP tasks, reliable typological data is a prerequisite. Existing typological …