A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT (2303.04226v1)

Published 7 Mar 2023 in cs.AI, cs.CL, and cs.LG

Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

PDF Abstract

A Comprehensive Overview of AI-Generated Content: From GANs to ChatGPT

The paper, "A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT," authored by Yihan Cao et al., presents a detailed examination of the field of Artificial Intelligence Generated Content (AIGC). The work is structured as a survey and tracks the evolution of generative AI technologies, focusing on their applications, foundational models, and challenges.

In the rapidly evolving domain of generative models, significant progress has been made in transforming generative adversarial networks (GANs), autoregressive models, and large-scale transformers into practical tools, culminating in the development of sophisticated models like ChatGPT, DALL-E-2, and Codex. These applications highlight the diverse scope of AIGC in generating high-quality text, image, and music content.

Historical Context and Development

The paper traces the history of generative AI, noting how early models like Hidden Markov Models and Gaussian Mixture Models laid the groundwork for later advances with deep learning. The introduction of GANs marked a substantial shift, enabling high-quality image generation and inspiring new research directions in unsupervised learning. As data availability and computational power grew, the emergence of large transformer models became a pivotal point, facilitating rapid advancements in both unimodal and multimodal generative tasks.

Technical Foundations

At the heart of these advancements are foundational models such as the Transformer architecture, which underpins many state-of-the-art systems today. The survey highlights how autoregressive and masked LLMs operate with high efficiency, facilitating applications from conversational agents to art generation systems. With scaling, models like GPT-3 and the introduction of methods like reinforcement learning from human feedback (RLHF), the capabilities of generating coherent and contextually relevant content have improved significantly.

Applications and Practical Implications

Generative AI systems are now integral to various applications, including but not limited to conversational interfaces such as ChatGPT, which leverage pre-trained LLMs fine-tuned with RLHF, enhancing their ability to align with human preferences for more relevant and safe interaction. In the visual domain, models like DALL-E-2, driven by diffusion mechanisms, produce high-fidelity images, while systems like Codex enable code generation, showcasing the flexibility of AIGC across different modalities.

Challenges and Ethical Considerations

While the capabilities of AI-driven content generation are profound, the survey discusses persistent challenges such as the factuality and toxicity of the generated content. Furthermore, the paper emphasizes the importance of addressing privacy issues, including membership inference and data extraction attacks, which pose significant risks if not adequately managed.

Future Directions

The survey recognizes the expanding frontier of AIGC and anticipates future innovations that aim to bridge the gap between high-stakes applications and AIGC's current outputs. It also underscores the importance of balancing specialization and generalization in training datasets to enhance the adaptability and efficacy of these models. As the field progresses, the survey suggests the need for more robust mechanisms to ensure that generative models are trustworthy and capable of ethical reasoning, addressing societal concerns as they become more ubiquitous in various facets of daily life.

In summary, this paper offers a comprehensive survey of AI-generated content, detailing its historical evolution, technical underpinnings, and current applications, while also identifying critical challenges and open research directions for the future. The work provides an important reference point for researchers aiming to contribute to or understand the current state and future trajectory of AIGC.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yihan Cao (14 papers)
Siyu Li (53 papers)
Yixin Liu (108 papers)
Zhiling Yan (12 papers)
Yutong Dai (21 papers)
Philip S. Yu (592 papers)
Lichao Sun (186 papers)

Citations (386)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/bigfootech/status/1832421143464071629