Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When AI Eats Itself: On the Caveats of AI Autophagy (2405.09597v3)

Published 15 May 2024 in cs.LG and cs.AI

Abstract: Generative AI technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimise training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimise outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend, known as the AI autophagy phenomenon, suggests a future where generative AI systems may increasingly consume their own outputs without discernment, raising concerns about model performance, reliability, and ethical implications. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? To address these research questions, this study examines the existing literature, delving into the consequences of AI autophagy, analyzing the associated risks, and exploring strategies to mitigate its impact. Our aim is to provide a comprehensive perspective on this phenomenon advocating for a balanced approach that promotes the sustainable development of generative AI technologies in the era of large models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Xiaodan Xing (35 papers)
  2. Fadong Shi (2 papers)
  3. Jiahao Huang (93 papers)
  4. Yinzhe Wu (30 papers)
  5. Yang Nan (40 papers)
  6. Sheng Zhang (212 papers)
  7. Yingying Fang (20 papers)
  8. Mike Roberts (9 papers)
  9. Carola-Bibiane Schönlieb (276 papers)
  10. Javier Del Ser (100 papers)
  11. Guang Yang (422 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com