The word entropy of natural languages- 学术资源搜索

文章

学术资源搜索

The word entropy of natural languages

C Bentz, D Alikaniotis - arXiv preprint arXiv:1606.06996, 2016 - arxiv.org

arXiv preprint arXiv:1606.06996, 2016•arxiv.org

The average uncertainty associated with words is an information-theoretic concept at the
heart of quantitative and computational linguistics. The entropy has been established as a
measure of this average uncertainty-also called average information content. We here use
parallel texts of 21 languages to establish the number of tokens at which word entropies
converge to stable values. These convergence points are then used to select texts from a
massively parallel corpus, and to estimate word entropies across more than 1000 …

The average uncertainty associated with words is an information-theoretic concept at the heart of quantitative and computational linguistics. The entropy has been established as a measure of this average uncertainty - also called average information content. We here use parallel texts of 21 languages to establish the number of tokens at which word entropies converge to stable values. These convergence points are then used to select texts from a massively parallel corpus, and to estimate word entropies across more than 1000 languages. Our results help to establish quantitative language comparisons, to understand the performance of multilingual translation systems, and to normalize semantic similarity measures.

arxiv.org

展开收起

被引用次数：23 相关文章所有 5 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果