Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 72 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization (2503.08619v1)

Published 11 Mar 2025 in cs.CV

Abstract: Recent advances in text-to-image generation have primarily relied on extensive datasets and parameter-heavy architectures. These requirements severely limit accessibility for researchers and practitioners who lack substantial computational resources. In this paper, we introduce \model, an efficient training paradigm for image generation models that uses knowledge distillation (KD) and Direct Preference Optimization (DPO). Drawing inspiration from the success of data KD techniques widely adopted in Multi-Modal LLMs (MLLMs), LightGen distills knowledge from state-of-the-art (SOTA) text-to-image models into a compact Masked Autoregressive (MAR) architecture with only $0.7B$ parameters. Using a compact synthetic dataset of just $2M$ high-quality images generated from varied captions, we demonstrate that data diversity significantly outweighs data volume in determining model performance. This strategy dramatically reduces computational demands and reduces pre-training time from potentially thousands of GPU-days to merely 88 GPU-days. Furthermore, to address the inherent shortcomings of synthetic data, particularly poor high-frequency details and spatial inaccuracies, we integrate the DPO technique that refines image fidelity and positional accuracy. Comprehensive experiments confirm that LightGen achieves image generation quality comparable to SOTA models while significantly reducing computational resources and expanding accessibility for resource-constrained environments. Code is available at https://github.com/XianfengWu01/LightGen

Summary

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

The paper presents LightGen, an image generation model that leverages knowledge distillation (KD) and Direct Preference Optimization (DPO) to achieve efficient text-to-image generation. Unlike traditional methods that rely heavily on large datasets and complex architectures, LightGen aims to democratize access to high-quality image generation by reducing computational and data requirements.

The authors integrate insights from multi-modal LLMs (MLLMs) to distill knowledge from state-of-the-art (SOTA) models into a lightweight Masked Autoregressive (MAR) architecture with just 0.7 billion parameters. Remarkably, this approach only utilizes a synthetic dataset of 2M images, which is considerably smaller than typical datasets for such tasks. The central premise of the paper is that data diversity can compensate for smaller data volumes, significantly reducing pre-training durations from thousands of GPU-days to only 88 GPU-days.

Two critical components underpin LightGen's efficacy:

  1. Knowledge Distillation: LightGen employs a strategy akin to data distillation used in MLLMs. By distilling knowledge from sophisticated text-to-image models into a more compact framework, LightGen preserves performance while curtailing the demand for extensive computing resources.
  2. Direct Preference Optimization: A distinctive feature of LightGen is the integration of DPO to improve image quality. Synthetic datasets often suffer from poor high-frequency details and spatial inaccuracies. DPO addresses these challenges, refining the image's fidelity and positional accuracy.

Quantitative evaluations on the GenEval benchmark demonstrate that LightGen performs competitively with SOTA models. Specifically, it excels in generating images with precise object positioning and vibrant colors at various resolutions, from 256x256 to 1024x1024 pixels. This positions LightGen not merely as an alternative but a viable option for high-quality, efficient image generation, especially in resource-constrained settings.

Implications and Future Directions

The implications of this research are manifold. Practically, it reduces barriers to entry for researchers and organizations with limited computational resources, broadening access to advanced image generation capabilities. Theoretically, this paper challenges the conventional wisdom that large-scale datasets are indispensable for superior performance in generative models. The emphasis on data diversity as opposed to sheer volume signals a potential paradigm shift in the development and deployment of AI models.

Future developments in AI could include further exploration of distillation techniques and other optimization algorithms to enhance the scalability and efficiency of machine learning models. Additionally, the integration of more robust synthetic datasets and advanced post-processing methods like DPO could refine the capabilities of lightweight models. Broadly, this approach might extend to other domains beyond image generation, such as video synthesis or multi-modal AI tasks, where resource efficiency remains a critical concern.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube