A surprisal--duration trade-off across and within the world's languages

C Meister, T Pimentel, G Wiher… - Transactions of the …, 2023 - direct.mit.edu

Today's probabilistic language generators fall short when it comes to producing coherent
and fluent text despite the fact that the underlying models perform well under standard …

被引用次数：77 相关文章所有 12 个版本

[HTML] mit.edu

[HTML][HTML] Testing the predictions of surprisal theory in 11 languages

EG Wilcox, T Pimentel, C Meister, R Cotterell… - Transactions of the …, 2023 - direct.mit.edu

Surprisal theory posits that less-predictable words should take more time to process, with
word predictability quantified as surprisal, ie, negative log probability in context. While …

被引用次数：22 相关文章所有 8 个版本

[PDF] arxiv.org

Revisiting the uniform information density hypothesis

C Meister, T Pimentel, P Haller, L Jäger… - arXiv preprint arXiv …, 2021 - arxiv.org

The uniform information density (UID) hypothesis posits a preference among language
users for utterances structured such that information is distributed uniformly across a signal …

被引用次数：62 相关文章所有 11 个版本

[PDF] arxiv.org

On the probability-quality paradox in language generation

C Meister, G Wiher, T Pimentel, R Cotterell - arXiv preprint arXiv …, 2022 - arxiv.org

When generating natural language from neural probabilistic models, high probability does
not always coincide with high quality: It has often been observed that mode-seeking …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Revisiting the optimality of word lengths

T Pimentel, C Meister, EG Wilcox, K Mahowald… - arXiv preprint arXiv …, 2023 - arxiv.org

Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative
costs. Under the assumption that cost is given by an utterance's length, he supported this …

被引用次数：4 相关文章所有 5 个版本

[HTML] mit.edu

[HTML][HTML] A Cross-Linguistic Pressure for Uniform Information Density in Word Order

TH Clark, C Meister, T Pimentel, M Hahn… - Transactions of the …, 2023 - direct.mit.edu

While natural languages differ widely in both canonical word order and word order flexibility,
their word orders still follow shared cross-linguistic statistical patterns, often attributed to …

被引用次数：3 相关文章所有 8 个版本

[PDF] arxiv.org

An information-theoretic analysis of self-supervised discrete representations of speech

BM Abdullah, MM Shaik, B Möbius… - arXiv preprint arXiv …, 2023 - arxiv.org

Self-supervised representation learning for speech often involves a quantization step that
transforms the acoustic input into discrete units. However, it remains unclear how to …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

Quantifying the redundancy between prosody and text

L Wolf, T Pimentel, E Fedorenko, R Cotterell… - arXiv preprint arXiv …, 2023 - arxiv.org

Prosody--the suprasegmental component of speech, including pitch, loudness, and tempo--
carries critical aspects of meaning. However, the relationship between the information …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Using linguistic typology to enrich multilingual lexicons: the case of lexical gaps in kinship

T Khishigsuren, G Bella, K Batsuren, AA Freihat… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper describes a method to enrich lexical resources with content relating to linguistic
diversity, based on knowledge from the field of lexical typology. We capture the …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages

K Mahowald, E Diachek, E Gibson, E Fedorenko… - Cognition, 2023 - Elsevier

Grammatical cues are sometimes redundant with word meanings in natural language. For
instance, English word order rules constrain the word order of a sentence like “The dog …

被引用次数：9 相关文章所有 8 个版本