From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Statistical language models for information retrieval a critical review

CX Zhai - Foundations and Trends® in Information Retrieval, 2008 - nowpublishers.com
Statistical language models have recently been successfully applied to many information
retrieval problems. A great deal of recent work has shown that statistical language models …

Domain-specific language model pretraining for biomedical natural language processing

Y Gu, R Tinn, H Cheng, M Lucas, N Usuyama… - ACM Transactions on …, 2021 - dl.acm.org
Pretraining large neural language models, such as BERT, has led to impressive gains on
many natural language processing (NLP) tasks. However, most pretraining efforts focus on …

Large-scale evidence for logarithmic effects of word predictability on reading time

C Shain, C Meister, T Pimentel… - Proceedings of the …, 2024 - National Acad Sciences
During real-time language comprehension, our minds rapidly decode complex meanings
from sequences of words. The difficulty of doing so is known to be related to words' …

Good-enough compositional data augmentation

J Andreas - arXiv preprint arXiv:1904.09545, 2019 - arxiv.org
We propose a simple data augmentation protocol aimed at providing a compositional
inductive bias in conditional and unconditional sequence models. Under this protocol …

Applications of topic models

J Boyd-Graber, Y Hu, D Mimno - Foundations and Trends® in …, 2017 - nowpublishers.com
How can a single person understand what's going on in a collection of millions of
documents? This is an increasingly common problem: sifting through an organization's e …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Composition in distributional models of semantics

J Mitchell, M Lapata - Cognitive science, 2010 - Wiley Online Library
Vector‐based models of word meaning have become increasingly popular in cognitive
science. The appeal of these models lies in their ability to represent meaning simply by …

An empirical study of smoothing techniques for language modeling

SF Chen, J Goodman - Computer Speech & Language, 1999 - Elsevier
We survey the most widely-used algorithms for smoothing models for language n-gram
modeling. We then present an extensive empirical comparison of several of these smoothing …

Automating the construction of internet portals with machine learning

AK McCallum, K Nigam, J Rennie, K Seymore - Information Retrieval, 2000 - Springer
Abstract Domain-specific internet portals are growing in popularity because they gather
content from the Web and organize it for easy access, retrieval and search. For example …