Language model behavior: A comprehensive survey
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
The science of detecting llm-generated text
ACM: Digital Library: Communications of the ACM ACM Digital Library Communications of the
ACM Volume 67, Number 4 (2024), Pages 50-59 The Science of Detecting LLM-Generated Text …
ACM Volume 67, Number 4 (2024), Pages 50-59 The Science of Detecting LLM-Generated Text …
Autoregressive search engines: Generating substrings as document identifiers
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
Survey on factuality in large language models: Knowledge, retrieval and domain-specificity
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …
Autoregressive entity retrieval
Entities are at the center of how we represent and aggregate knowledge. For instance,
Encyclopedias such as Wikipedia are structured by entities (eg, one per Wikipedia article) …
Encyclopedias such as Wikipedia are structured by entities (eg, one per Wikipedia article) …
Recipes for building an open-domain chatbot
Building open-domain chatbots is a challenging area for machine learning research. While
prior work has shown that scaling neural models in the number of parameters and the size of …
prior work has shown that scaling neural models in the number of parameters and the size of …
Mauve: Measuring the gap between neural text and human text using divergence frontiers
As major progress is made in open-ended text generation, measuring how close machine-
generated text is to human language remains a critical open problem. We introduce Mauve …
generated text is to human language remains a critical open problem. We introduce Mauve …
Zerogen: Efficient zero-shot learning via dataset generation
There is a growing interest in dataset generation recently due to the superior generative
capacity of large pre-trained language models (PLMs). In this paper, we study a flexible and …
capacity of large pre-trained language models (PLMs). In this paper, we study a flexible and …
Reframing human-AI collaboration for generating free-text explanations
Large language models are increasingly capable of generating fluent-appearing text with
relatively little task-specific supervision. But can these models accurately explain …
relatively little task-specific supervision. But can these models accurately explain …
Retrieval-augmented generation for knowledge-intensive nlp tasks
Large pre-trained language models have been shown to store factual knowledge in their
parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks …
parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks …