Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 75 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Information-Guided Diffusion Sampling for Dataset Distillation (2507.04619v1)

Published 7 Jul 2025 in cs.LG, cs.AI, cs.CV, cs.IT, and math.IT

Abstract: Dataset distillation aims to create a compact dataset that retains essential information while maintaining model performance. Diffusion models (DMs) have shown promise for this task but struggle in low images-per-class (IPC) settings, where generated samples lack diversity. In this paper, we address this issue from an information-theoretic perspective by identifying two key types of information that a distilled dataset must preserve: ($i$) prototype information $\mathrm{I}(X;Y)$, which captures label-relevant features; and ($ii$) contextual information $\mathrm{H}(X | Y)$, which preserves intra-class variability. Here, $(X,Y)$ represents the pair of random variables corresponding to the input data and its ground truth label, respectively. Observing that the required contextual information scales with IPC, we propose maximizing $\mathrm{I}(X;Y) + \beta \mathrm{H}(X | Y)$ during the DM sampling process, where $\beta$ is IPC-dependent. Since directly computing $\mathrm{I}(X;Y)$ and $\mathrm{H}(X | Y)$ is intractable, we develop variational estimations to tightly lower-bound these quantities via a data-driven approach. Our approach, information-guided diffusion sampling (IGDS), seamlessly integrates with diffusion models and improves dataset distillation across all IPC settings. Experiments on Tiny ImageNet and ImageNet subsets show that IGDS significantly outperforms existing methods, particularly in low-IPC regimes. The code will be released upon acceptance.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube