AI-Generated Content (AIGC)

Updated 29 July 2025

AIGC is a set of techniques that use advanced generative models to automatically produce digital content such as text, images, video, audio, code, and 3D assets with minimal human intervention.
Evolving from rule-based systems to deep neural networks, AIGC now leverages transfer learning and multimodal architectures like transformers to enhance quality, diversity, and realism.
AIGC transforms digital media production by increasing speed, personalization, and cost efficiency while raising challenges in evaluation, security, ethics, and resource management.

Artificial Intelligence-Generated Content (AIGC) refers to content produced by artificial intelligence models without, or with minimal, direct human authorship. AIGC spans a wide array of data modalities—including text, images, video, audio, code, and 3D assets—and is underpinned by advances in generative modeling, large-scale foundation models, and multimodal architectures. Its adoption is transforming the cost, speed, and personalization of digital media, while introducing new challenges in evaluation, optimization, resource allocation, security, and social regulation.

1. Evolution of AIGC: Modeling Paradigms and Architectural Advances

The development of AIGC can be delineated into four milestones: early rule-based systems, statistical generative models (e.g., n-grams, Hidden Markov Models), deep neural network-based models, and transfer learning with large-scale pre-trained models (Zhu et al., 2 Dec 2024). Rule-based systems were transparent but limited in expressiveness. Statistical models improved scalability and variability. Deep learning—encompassing convolutional (CNN), recurrent (RNN/LSTM), and transformer-based architectures—brought dramatic advances in quality, diversity, and realism, at the cost of much greater data and computational requirements. The current era is dominated by transfer learning and foundation models: for example, transformer-based language and vision models (GPT, BERT, T5, Vision Transformer, CLIP), and generative diffusion models (e.g., Stable Diffusion, DALL·E2), which are pre-trained on vast datasets and then adapted to downstream tasks by fine-tuning or prompting (Cao et al., 2023, Wu et al., 2023).

A key breakthrough was the introduction of the transformer architecture (attention mechanism):

$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$

where $Q, K, V$ are the query, key, and value matrices and $d_k$ the key dimensionality, enabling context-sensitive content generation at scale.

2. Modalities, Multimodality, and Application Scenarios

AIGC systems are developed for single and cross-modal generation tasks (Foo et al., 2023). Major modalities and generation modes include:

Text-to-Text: Chatbots, summarization.
Text-to-Image: Synthesis of images from natural language prompts.
Text-to-Video, Text-to-3D: Supporting AR/VR content and the Metaverse.
Image-to-Image: Restoration, style transfer.
Audio: Music, speech, and voice synthesis.
Cross-modality: Converting input in one modality (e.g., text) to another (e.g., image, 3D scene, audio).

Multimodal systems typically use coordinated encoders for each modality, merging them in joint latent spaces or using cross-attention for conditioning generation (e.g., CLIP, DALL-E2) (Cao et al., 2023). Emerging applications extend to digital storytelling, code generation, digital twins, education, urban and landscape design, personalized recommendations, and healthcare (Matuszewski et al., 2023, Xing et al., 12 Feb 2025, Mo et al., 26 Nov 2024).

3. Model Evaluation, Perceived Quality, and Resource-Quality Trade-offs

Quantitative evaluation of AIGC depends on application and modality. For images, performance is measured via both no-reference metrics (Total Variation, BRISQUE) and full-reference metrics (DSS, HaarPSI, MDSI, VIF) (Du et al., 2023). As empirical studies with diffusion models show, increasing the number of inference (denoising) steps generally improves reconstruction quality in a near-saturating fashion, which can be captured using a simple regression or piecewise fit between critical points $(A_x, A_y)$ (where quality starts to rise) and $(B_x, B_y)$ (where it plateaus).

In wireless/mobile and edge computing environments, the relationship between computational resource expenditure (e.g., the number of denoising steps) and perceptual output quality is formalized in joint optimization schemes (Wang et al., 2023). For example, the expected error after $s_n$ reverse diffusion steps can be modeled as:

$\bar{E}_n(s_n) = \bar{E}^{(\text{fwd})}(S) \cdot \exp(-s_n C_{1,n})$

with utility functions such as:

$U_n(s_n) = 1 - \exp(-s_n C_{2,n})$

enabling explicit trade-offs between latency, quality, and energy/resource consumption.

4. Service Provisioning, Edge/Cloud Collaboration, and System Architectures

AIGC service architectures frequently adopt cloud-edge-terminal (device) collaboration to balance model size, real-time constraints, and user personalization (Zhang et al., 2 Jul 2024, Xu et al., 2023, Cheng et al., 2023). Typical patterns include:

AIGC-as-a-Service (AaaS): Edge servers host third-party pre-trained models (ASPs), permitting lower latency and user-specific content (Du et al., 2023).
Collaborative Diffusion: Heavy denoising steps are offloaded to edge/cloud servers, with only lightweight refinement performed on user devices, preserving energy and privacy (Du et al., 2023, Cheng et al., 2023).
Dynamic Resource Allocation: Deep reinforcement learning (e.g., soft actor–critic, D3QN, MARL) is used to assign user tasks to the best service provider, balancing utility, congestion, and content quality (Du et al., 2023, Zhang et al., 2 Jul 2024).

Resource-aware workload trade-off mechanisms, expressed as Markov Decision Processes, can dynamically allocate inference computation between transmitter and receiver for optimal semantic recovery under fluctuating channel and device constraints.

5. Security, Privacy, Trust, and Societal Implications

Security, privacy, trust, and ethics are persistent concerns for AIGC, due to threats such as data poisoning, adversarial attacks, unauthorized model use, and content authenticity (Wang et al., 2023). Challenges include:

Privacy threats from data leakage, especially in HMI or federated environments (Huang et al., 2023).
Trust and authentication, tackled via watermarking, blockchain, and cryptographic methods embedded in both model parameters and outputs.
Misuse: The proliferation of abusive content, including deepfakes and explicit imagery, is elevated in open AIGC model hubs (e.g., Civitai), with abusive models showing higher engagement and their creators occupying central roles in social networks (Wei et al., 16 Jul 2024).

Ethical considerations include managing AI biases, misinformation, and ownership/copyright, with the need for regulation, adaptive watermarking, and security-by-design frameworks (Wang et al., 2023, Wu et al., 2023).

6. Challenges, Open Problems, and Future Directions

Despite rapid progress, AIGC research faces the following persistent challenges:

Factuality and Hallucinations: Generative models may synthesize plausible but incorrect content. Solutions involve new evaluation strategies, RLHF, and fact-aware sampling (Cao et al., 2023).
Bias, Safety, and Regulation: These require advances in human feedback protocols, content filtering, and legislative action (Cao et al., 2023, Wang et al., 2023).
Scalability and Sustainability: Computational costs remain high; ongoing research targets prompt learning, model compression, and green AIGC paradigms (Cao et al., 2023, Xu et al., 2023, Wang et al., 2023).
Resource Constraints: Edge deployment must jointly optimize offloading, energy, and model specialization (e.g., via federated learning and LoRA) (Wang et al., 2023, Huang et al., 2023).
Interoperability and Standardization: The emergence of file formats that compress generation syntax (e.g., AIGIF), instead of pixels, achieves compression ratios up to 1:10,000, but hinges on preservation of platform/model/data syntax (Gao et al., 13 Oct 2024).

Future directions stress sustainable operation, explainability, privacy-preservation, robust multimodal modeling, domain specificity, and sociotechnical integration (e.g., via digital twins, the Metaverse, or social platforms).

7. Societal Impact and Application Ecosystem

AIGC's impact manifests across digital media, communications, code generation, conversational agents, design, healthcare, scientific research, and entertainment (Cao et al., 2023, Wu et al., 2023, Xing et al., 12 Feb 2025, Mo et al., 26 Nov 2024). Its industrial chain spans upstream (data/hardware/algorithms), midstream (platforms and tools), and downstream (content platforms, e-commerce, media production) (Wu et al., 2023). In landscape architecture and urban/participatory design, AIGC augments or automates spatial analysis, visual simulation, conceptual sketch generation, and parametric optimization, while shifting the role of designers toward collaborators or facilitators (Xing et al., 12 Feb 2025, Mo et al., 26 Nov 2024).

AIGC thus provides a technological substrate for a paradigm shift in content creation, marked by increased efficiency, diversity, user participation, and automation, accompanied by complex resource, ethical, and regulatory landscape challenges.