DiffAgent: Automated Acceleration for Diffusion Models
- DiffAgent is a framework that automates the generation, refinement, and evaluation of acceleration strategies for diffusion models.
- It integrates planning, code synthesis, debugging, and genetic optimization to compose and tune multi-method diffusion accelerations under quantitative constraints.
- The system leverages DiffBench for rigorous benchmarking, achieving up to 2.5x speedup with minimal quality loss on various diffusion architectures.
DiffAgent is a LLM-powered agentic framework for automated generation, refinement, and evaluation of acceleration strategies and code for diffusion models. It addresses the core challenges of diffusion model deployment—high computational cost and complexity of integrating multiple acceleration techniques—by orchestrating an iterative planning, code-generation, debugging, and feedback-driven optimization loop within a rigorous benchmarking environment. DiffAgent leverages structured feedback from DiffBench, a specialized multi-axis benchmark, and incorporates a genetic search algorithm to efficiently explore the combinatorial space of acceleration methods, parameters, and hardware-specific trade-offs. The system significantly surpasses baseline LLMs and prior code-gen approaches in composing and tuning multi-method diffusion accelerations under qualitative and quantitative constraints (Jiao et al., 6 Jan 2026).
1. Motivation and Objectives
Diffusion models, such as Stable Diffusion, DiT, and PixArt, have become central to state-of-the-art image and video generation, but their iterative, multi-step sampling pipelines impose substantial inference latency and resource demands. The landscape of acceleration methods—including fast samplers (e.g., DDIM, DPM-Solver, UniPC), operator fusion, precision reduction, token merging (ToMe), BERT-style feature reuse (DeepCache), and adaptive early exit (T-Gate)—is fragmented, and optimal composition is highly architecture- and deployment-dependent. Practitioners face a combinatorial design problem: selecting, integrating, and tuning acceleration methods to meet task-specific speedup or latency and quality-preservation constraints, under diverse hardware and operational scenarios.
DiffAgent was designed to automate this process, providing an agent that systematically generates, tests, and refines acceleration strategies and their code realizations for arbitrary diffusion models. The framework executes within an end-to-end automated evaluation pipeline (DiffBench) that enforces correctness, sample quality, and performance criteria through standardized, developer-realistic benchmarks (Jiao et al., 6 Jan 2026).
2. DiffAgent System Architecture
DiffAgent comprises three principal agentic components, orchestrated in a closed-loop workflow:
- Planning Component: Given a target model, task, and deployment constraints, the planning module proposes an initial composition of acceleration techniques and their critical hyperparameters.
- Code Generation Component: This module synthesizes an executable code snippet implementing the planned acceleration pipeline, leveraging LLMs trained on open diffusion libraries and best practices.
- Debugging Component: Post-generation, the agent analyzes errors (compilation failures, runtime errors, or output mismatches), diagnoses causes, and instructs the code generator to correct or adjust code and parameters.
A Genetic Optimization Module extracts and interprets structured feedback from DiffBench’s runtime measurements (speedup, sample quality, constraint satisfaction), guiding successive code refinements and parameter choices. Selection, mutation, and crossover are applied to the population of candidate acceleration strategies and code implementations, with fitness determined by benchmark metrics and constraint adherence. The evolutionary search typically uses a population size and selection rounds per optimization sweep, balancing exploration and cost (Jiao et al., 6 Jan 2026).
3. DiffBench: Benchmark Foundation and Evaluation Pipeline
DiffBench, the co-developed benchmark, implements a three-stage pipeline for rigorously evaluating generated acceleration code in conditions mirroring human deployment:
- Stage 1 (Static Parameter Assessment): The code and reference implementations are parsed and checked for exact matches in pipeline type, model configuration, samplers, targeted acceleration methods, and pre/post-processing logic.
- Stage 2 (Absolute Performance Measurement): The candidate code is executed on a held-out MS-COCO sample, measuring CLIP-Score (for text-image alignment and visual fidelity). Only code achieving a maximum relative drop (e.g., 5%) in CLIP-Score advances.
- Stage 3 (Relative Performance Analysis): The system compares candidate and unoptimized baseline implementations, reporting quality loss and speedup
together with per-sample device latency for latency-constrained tasks.
Table: Overview of DiffAgent Components and Pipeline
| Component | Role | Output/Feedback |
|---|---|---|
| Planning | Proposes method+parameter combos | Acceleration plan |
| Code Generation | Synthesizes code for plan | Executable candidate pipeline |
| Debugging | Diagnoses/corrects errors | Corrected code/parameters |
| Genetic Search | Optimizes via fitness from benchmark | Refined strategy/code pool |
| DiffBench | Evaluates correctness, quality, time | Pass/fail, CLIP-Score, , |
4. Acceleration Methods, Composition Strategies, and Benchmark Coverage
DiffAgent's search space and DiffBench's reference suite include:
- Operator-level fusion with PyTorch compile
- Precision reduction (FP16)
- Token merging (ToMe) in attention layers
- DeepCache for feature reuse
- T-Gate learning-based early exit in denoising loops
- Sampling: DDIM, DPM-Solver, UniPC
Compositional strategies are tested up to five task levels: no acceleration, single-method, compositional multi-method, explicit speedup target, and joint speedup/quality-constrained. Supported architectures span latent diffusion (Stable Diffusion v1.5, v2.1, XL), transformer-based (DiT, PixArt-α, PixArt-Σ), various conditioning modes (text, class, image), and image resolutions from 256 to 1024 (Jiao et al., 6 Jan 2026).
5. Evaluation Metrics and Empirical Findings
- Speedup Factor:
- Throughput:
- Memory Reduction:
- Sample Quality: CLIP-Score (main), FID (optional)
- Latency: Per-sample
DiffAgent’s agentic loop yields major improvements over baseline LLMs:
- Out-of-the-box LLMs (GPT-4.1, Claude Sonnet 4, Gemini 2.5, o3-mini) score below 35% on advanced (multi-method/constraint) tasks.
- DiffAgent boosts average pass rates up to 56.5% (o3-mini) and 81.6% (Claude Sonnet 4).
- Compositional tasks display the largest gains (+65 percentage points) when the agent learns method ordering and parameter semantics.
- Effective strategies include utilizing precision reduction (FP16) as a default, complementing it with ToMe and DeepCache, and calibrating T-Gate gating thresholds to balance speedup and stability.
- Typical best configurations yield – speedup with CLIP drop; full method stacking can reach speedup (Jiao et al., 6 Jan 2026).
6. Deployment Scenarios and Implications
DiffAgent and DiffBench primarily target GPU-based environments (NVIDIA A100, AMD MI100/MI200), but are extensible to CPUs and edge accelerators by specifying device contexts. Tasks define hardware, allowed latency, memory budget, and sampling constraints, ensuring realistic assessment and transferability. These frameworks provide actionable guidance for both LLM-based and human-devised diffusion acceleration code design.
A plausible implication is that such agentic frameworks could generalize to other model acceleration and optimization domains—enabling automated composition, correctness, and efficiency audits that keep pace with model, method, and hardware diversification.
7. Future Directions
Potential expansions include:
- Extending DiffAgent's planning and search interfaces to encompass new acceleration methods as they emerge, such as hardware-adaptive or distributed inference optimizations.
- Jointly optimizing for other metrics (energy, cost, memory) alongside speed and fidelity.
- Incorporating real-world deployment feedback (on-device profiling, user constraints) into the genetic optimization loop.
- Integrating with code-diff understanding agents, such as those benchmarked with Diff-XYZ, to enable sophisticated editing and refactoring capabilities for model pipelines under active development.
The combination of agent-driven strategy search, rigorous multi-axis benchmarking, and executable code synthesis embodied by DiffAgent and DiffBench provides a reproducible, scalable paradigm for inference acceleration in diffusion and, by extension, other iterative generative modeling frameworks (Jiao et al., 6 Jan 2026).