Large language models for data annotation: A survey

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

AI models collapse when trained on recursively generated data

I Shumailov, Z Shumaylov, Y Zhao, N Papernot… - Nature, 2024 - nature.com
Stable diffusion revolutionized image creation from descriptive text. GPT-2 (ref.), GPT-3 (.
5)(ref.) and GPT-4 (ref.) demonstrated high performance across a variety of language tasks …

The science of detecting llm-generated text

R Tang, YN Chuang, X Hu - Communications of the ACM, 2024 - dl.acm.org
ACM: Digital Library: Communications of the ACM ACM Digital Library Communications of the
ACM Volume 67, Number 4 (2024), Pages 50-59 The Science of Detecting LLM-Generated Text …

Scalable watermarking for identifying large language model outputs

S Dathathri, A See, S Ghaisas, PS Huang, R McAdam… - Nature, 2024 - nature.com
Large language models (LLMs) have enabled the generation of high-quality synthetic text,
often indistinguishable from human-written content, at a scale that can markedly affect the …

[HTML][HTML] Human-AI coevolution

D Pedreschi, L Pappalardo, E Ferragina… - Artificial Intelligence, 2024 - Elsevier
Human-AI coevolution, defined as a process in which humans and AI algorithms
continuously influence each other, increasingly characterises our society, but is …

A survey on llm-gernerated text detection: Necessity, methods, and future directions

J Wu, S Yang, R Zhan, Y Yuan, DF Wong… - arXiv preprint arXiv …, 2023 - arxiv.org
The powerful ability to understand, follow, and generate complex language emerging from
large language models (LLMs) makes LLM-generated text flood many areas of our daily …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arXiv preprint arXiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

Tinygsm: achieving> 80% on gsm8k with small language models

B Liu, S Bubeck, R Eldan, J Kulkarni, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Small-scale models offer various computational advantages, and yet to which extent size is
critical for problem-solving abilities remains an open question. Specifically for solving grade …

Are large language models a threat to digital public goods? evidence from activity on stack overflow

M del Rio-Chanona, N Laurentsyeva… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models like ChatGPT efficiently provide users with information about
various topics, presenting a potential substitute for searching the web and asking people for …