Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 91 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 209 tok/s Pro

GPT OSS 120B 458 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

GMem: A Modular Approach for Ultra-Efficient Generative Models (2412.08781v2)

Published 11 Dec 2024 in cs.CV and cs.LG

Abstract: Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models. Our approach GMem decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data. The results are significant: GMem enhances both training, sampling efficiency, and diversity generation. This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency. On ImageNet at $256 \times 256$ resolution, GMem achieves a $50\times$ training speedup compared to SiT, reaching FID $=7.66$ in fewer than $28$ epochs ($\sim 4$ hours training time), while SiT requires $1400$ epochs. Without classifier-free guidance, GMem achieves state-of-the-art (SoTA) performance FID $=1.53$ in $160$ epochs with only $\sim 20$ hours of training, outperforming LightningDiT which requires $800$ epochs and $\sim 95$ hours to attain FID $=2.17$.