On structuring probabilistic dependences in stochastic language modelling

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

被引用次数：87 相关文章所有 3 个版本

[PDF] nowpublishers.com

Statistical language models for information retrieval a critical review

CX Zhai - Foundations and Trends® in Information Retrieval, 2008 - nowpublishers.com

Statistical language models have recently been successfully applied to many information
retrieval problems. A great deal of recent work has shown that statistical language models …

被引用次数：723 相关文章所有 20 个版本

[PDF] arxiv.org

Domain-specific language model pretraining for biomedical natural language processing

Y Gu, R Tinn, H Cheng, M Lucas, N Usuyama… - ACM Transactions on …, 2021 - dl.acm.org

Pretraining large neural language models, such as BERT, has led to impressive gains on
many natural language processing (NLP) tasks. However, most pretraining efforts focus on …

被引用次数：1816 相关文章所有 5 个版本

[PDF] pnas.org Full View

Large-scale evidence for logarithmic effects of word predictability on reading time

C Shain, C Meister, T Pimentel… - Proceedings of the …, 2024 - National Acad Sciences

During real-time language comprehension, our minds rapidly decode complex meanings
from sequences of words. The difficulty of doing so is known to be related to words' …

被引用次数：77 相关文章所有 13 个版本

[PDF] arxiv.org

Good-enough compositional data augmentation

J Andreas - arXiv preprint arXiv:1904.09545, 2019 - arxiv.org

We propose a simple data augmentation protocol aimed at providing a compositional
inductive bias in conditional and unconditional sequence models. Under this protocol …

被引用次数：259 相关文章所有 4 个版本

[PDF] nowpublishers.com

Applications of topic models

J Boyd-Graber, Y Hu, D Mimno - Foundations and Trends® in …, 2017 - nowpublishers.com

How can a single person understand what's going on in a collection of millions of
documents? This is an increasingly common problem: sifting through an organization's e …

被引用次数：370 相关文章所有 5 个版本

[PDF] jair.org

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org

Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

被引用次数：236 相关文章所有 11 个版本

[PDF] wiley.com Full View

Composition in distributional models of semantics

J Mitchell, M Lapata - Cognitive science, 2010 - Wiley Online Library

Vector‐based models of word meaning have become increasingly popular in cognitive
science. The appeal of these models lies in their ability to represent meaning simply by …

被引用次数：1191 相关文章所有 15 个版本

[PDF] arxiv.org

An empirical study of smoothing techniques for language modeling

SF Chen, J Goodman - Computer Speech & Language, 1999 - Elsevier

We survey the most widely-used algorithms for smoothing models for language n-gram
modeling. We then present an extensive empirical comparison of several of these smoothing …

被引用次数：4418 相关文章所有 35 个版本

[PDF] springer.com

Automating the construction of internet portals with machine learning

AK McCallum, K Nigam, J Rennie, K Seymore - Information Retrieval, 2000 - Springer

Abstract Domain-specific internet portals are growing in popularity because they gather
content from the Web and organize it for easy access, retrieval and search. For example …

被引用次数：1629 相关文章所有 24 个版本