Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems

PS Dodds, JR Minot, MV Arnold, T Alshaabi… - EPJ Data …, 2023 - epjds.epj.org
Complex systems often comprise many kinds of components which vary over many orders of
magnitude in size: Populations of cities in countries, individual and corporate wealth in …

Lexical richness and text length: an entropy-based perspective

Y Shi, L Lei - Journal of Quantitative Linguistics, 2022 - Taylor & Francis
Text length is a major concern in the measurement of lexical richness, and how lexical
richness is affected by text length still remains open. The present study aims to explore the …

Diverging divergences: Examining variants of Jensen Shannon divergence for corpus comparison tasks

J Lu, M Henchion, B Mac Namee - 2021 - t-stor.teagasc.ie
Jensen-Shannon divergence (JSD) is a distribution similarity measurement widely used in
natural language processing. In corpus comparison tasks, where keywords are extracted to …

A large quantitative analysis of written language challenges the idea that all languages are equally complex

A Koplenig, S Wolfer, P Meyer - Scientific Reports, 2023 - nature.com
One of the fundamental questions about human language is whether all languages are
equally complex. Here, we approach this question from an information-theoretic perspective …

Scaling laws and dynamics of hashtags on Twitter

HH Chen, TJ Alexander, DFM Oliveira… - … Journal of Nonlinear …, 2020 - pubs.aip.org
In this paper, we quantify the statistical properties and dynamics of the frequency of hashtag
use on Twitter. Hashtags are special words used in social media to attract attention and to …

Lexical borrowing in Korean: a diachronic approach based on a corpus analysis

Y Oh, H Son - Corpus Linguistics and Linguistic Theory, 2024 - degruyter.com
Loanwords are lexical terms borrowed from foreign languages by transliterating the original
sound of the borrowed words with the recipient language's consonants and vowels. This …

Leben, lieben, leiden: Geschlechterstereotype in Wörterbüchern, Einfluss der Korpusgrundlage und Abbild der sprachlichen ‚Wirklichkeit '

C Müller-Spitzer, H Lobin - Genus–Sexus–Gender, 2022 - ids-pub.bsz-bw.de
Wissenschaftlich basierte allgemeine Wörterbücher des Deutschen werden heute meist
korpusbasiert erarbeitet, dh die in ihnen beschriebene Sprache wird vor der …

Information theory and language

Ł Dębowski, C Bentz - Entropy, 2020 - mdpi.com
Human language is a system of communication. Communication, in turn, consists primarily
of information transmission. Writing about the interactions between information and natural …

Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction

J Liu, M Zhao, W Kong - Entropy, 2019 - mdpi.com
Dimensionality reduction has always been a major problem for handling huge
dimensionality datasets. Due to the utilization of labeled data, supervised dimensionality …

[PDF][PDF] page Human languages trade off complexity against efficiency

A Koplenig, S Wolfer, JO Rüdiger, P Meyer - researchgate.net
From a cross-linguistic perspective, language models are interesting because they can be
used as idealised language learners that learn to produce and process language by being …