Large language models for data annotation: A survey
Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …
information, which could be used for improving the efficacy of machine learning models. The …
Siren's song in the AI ocean: a survey on hallucination in large language models
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …
range of downstream tasks, a significant concern revolves around their propensity to exhibit …
AI models collapse when trained on recursively generated data
Stable diffusion revolutionized image creation from descriptive text. GPT-2 (ref.), GPT-3 (.
5)(ref.) and GPT-4 (ref.) demonstrated high performance across a variety of language tasks …
5)(ref.) and GPT-4 (ref.) demonstrated high performance across a variety of language tasks …
The science of detecting llm-generated text
ACM: Digital Library: Communications of the ACM ACM Digital Library Communications of the
ACM Volume 67, Number 4 (2024), Pages 50-59 The Science of Detecting LLM-Generated Text …
ACM Volume 67, Number 4 (2024), Pages 50-59 The Science of Detecting LLM-Generated Text …
A survey on llm-gernerated text detection: Necessity, methods, and future directions
The powerful ability to understand, follow, and generate complex language emerging from
large language models (LLMs) makes LLM-generated text flood many areas of our daily …
large language models (LLMs) makes LLM-generated text flood many areas of our daily …
Tinygsm: achieving> 80% on gsm8k with small language models
Small-scale models offer various computational advantages, and yet to which extent size is
critical for problem-solving abilities remains an open question. Specifically for solving grade …
critical for problem-solving abilities remains an open question. Specifically for solving grade …
Are large language models a threat to digital public goods? evidence from activity on stack overflow
M del Rio-Chanona, N Laurentsyeva… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models like ChatGPT efficiently provide users with information about
various topics, presenting a potential substitute for searching the web and asking people for …
various topics, presenting a potential substitute for searching the web and asking people for …
Quantitative text analysis
Text analysis has undergone substantial evolution since its inception, moving from manual
qualitative assessments to sophisticated quantitative and computational methods. Beginning …
qualitative assessments to sophisticated quantitative and computational methods. Beginning …
Large language models suffer from their own output: An analysis of the self-consuming training loop
Large language models (LLM) have become state of the art in many benchmarks and
conversational LLM applications like ChatGPT are now widely used by the public. Those …
conversational LLM applications like ChatGPT are now widely used by the public. Those …
Rephrasing the web: A recipe for compute and data-efficient language modeling
Large language models are trained on massive scrapes of the web, which are often
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such …
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such …