Large language models suffer from their own output: An analysis of the self-consuming training loop
Large language models (LLM) have become state of the art in many benchmarks and
conversational LLM applications like ChatGPT are now widely used by the public. Those …
conversational LLM applications like ChatGPT are now widely used by the public. Those …
A tale of tails: Model collapse as a change of scaling laws
As AI model size grows, neural scaling laws have become a crucial tool to predict the
improvements of large models when increasing capacity and the size of original (human or …
improvements of large models when increasing capacity and the size of original (human or …
Beware of words: Evaluating the lexical diversity of conversational llms using chatgpt as case study
The performance of conversational Large Language Models (LLMs) in general, and of
ChatGPT in particular, is currently being evaluated on many different tasks, from logical …
ChatGPT in particular, is currently being evaluated on many different tasks, from logical …
Beyond model collapse: Scaling up with synthesized data requires reinforcement
Synthesized data from generative models is increasingly considered as an alternative to
human-annotated data for fine-tuning Large Language Models. This raises concerns about …
human-annotated data for fine-tuning Large Language Models. This raises concerns about …
A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions
Recommendation systems and assistants (in short, recommenders) are ubiquitous in online
platforms and influence most actions of our day-to-day lives, suggesting items or providing …
platforms and influence most actions of our day-to-day lives, suggesting items or providing …
Self-consuming generative models with curated data provably optimize human preferences
The rapid progress in generative models has resulted in impressive leaps in generation
quality, blurring the lines between synthetic and real data. Web-scale datasets are now …
quality, blurring the lines between synthetic and real data. Web-scale datasets are now …
Strong model collapse
Within the scaling laws paradigm, which underpins the training of large neural networks like
ChatGPT and Llama, we consider a supervised regression setting and establish the …
ChatGPT and Llama, we consider a supervised regression setting and establish the …
Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?
The advent of generative AI (GenAI) technology produces transformative impact on the
content creation landscape, offering alternative approaches to produce diverse, high-quality …
content creation landscape, offering alternative approaches to produce diverse, high-quality …
Model collapse demystified: The case of regression
In the era of large language models like ChatGPT, the phenomenon of" model collapse"
refers to the situation whereby as a model is trained recursively on data generated from …
refers to the situation whereby as a model is trained recursively on data generated from …
Regurgitative training: The value of real data in training large language models
What happens if we train a new Large Language Model (LLM) using data that are at least
partially generated by other LLMs? The explosive success of LLMs means that a substantial …
partially generated by other LLMs? The explosive success of LLMs means that a substantial …