Zipf's word frequency law in natural language: A critical review and future directions
ST Piantadosi - Psychonomic bulletin & review, 2014 - Springer
The frequency distribution of words has been a key object of study in statistical linguistics for
the past 70 years. This distribution approximately follows a simple mathematical form known …
the past 70 years. This distribution approximately follows a simple mathematical form known …
Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution
EA Pechenick, CM Danforth, PS Dodds - PloS one, 2015 - journals.plos.org
It is tempting to treat frequency trends from the Google Books data sets as indicators of the
“true” popularity of various words and phrases. Doing so allows us to draw quantitatively …
“true” popularity of various words and phrases. Doing so allows us to draw quantitatively …
The advantage of short paper titles
Vast numbers of scientific articles are published each year, some of which attract
considerable attention, and some of which go almost unnoticed. Here, we investigate …
considerable attention, and some of which go almost unnoticed. Here, we investigate …
Languages cool as they expand: Allometric scaling and the decreasing need for new words
We analyze the occurrence frequencies of over 15 million words recorded in millions of
books published during the past two centuries in seven different languages. For all …
books published during the past two centuries in seven different languages. For all …
Quantifying crowd size with mobile phone and Twitter data
Being able to infer the number of people in a specific area is of extreme importance for the
avoidance of crowd disasters and to facilitate emergency evacuations. Here, using a football …
avoidance of crowd disasters and to facilitate emergency evacuations. Here, using a football …
Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts
A common task in computational text analyses is to quantify how two corpora differ
according to a measurement like word frequency, sentiment, or information content …
according to a measurement like word frequency, sentiment, or information content …
Stochastic model for the vocabulary growth in natural languages
M Gerlach, EG Altmann - Physical Review X, 2013 - APS
We propose a stochastic model for the number of different words in a given database which
incorporates the dependence on the database size and historical changes. The main feature …
incorporates the dependence on the database size and historical changes. The main feature …
Analyzing lexical emergence in Modern American English online1
This article introduces a quantitative method for identifying newly emerging word forms in
large time-stamped corpora of natural language and then describes an analysis of lexical …
large time-stamped corpora of natural language and then describes an analysis of lexical …
The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets—Reconstructing the composition of the …
A Koplenig - Digital Scholarship in the Humanities, 2017 - academic.oup.com
Abstract The Google Ngram Corpora seem to offer a unique opportunity to study linguistic
and cultural change in quantitative terms. To avoid breaking any copyright laws, the data …
and cultural change in quantitative terms. To avoid breaking any copyright laws, the data …
Distance to the scaling law: a useful approach for unveiling relationships between crime and urban metrics
We report on a quantitative analysis of relationships between the number of homicides,
population size and ten other urban metrics. By using data from Brazilian cities, we show …
population size and ten other urban metrics. By using data from Brazilian cities, we show …