Bezier Distillation
- Bezier Distillation is a knowledge distillation framework that uses Bezier-curve interpolation to integrate multi-teacher guidance in flow-based generative models.
- It replaces straight-line ODE flows with multi-step Bezier trajectories, mitigating error accumulation and improving convergence.
- The method enables efficient, high-fidelity mappings in tasks like image synthesis by reducing overall distillation error with fewer model iterations.
Bezier Distillation is a knowledge distillation framework for flow-based generative modeling that leverages Bezier-curve interpolation through intermediate “teacher” distributions to mitigate error accumulation in rectified flows. The method extends conventional rectified flow distillation—where straight-line ODE flows between a base distribution and a target are repeatedly composed and then distilled—by replacing the straight-line coupling with multi-step Bezier trajectories anchored by intermediate rectified flows. This approach allows the student model to efficiently acquire more accurate mappings between source and target distributions with reduced cumulative error, and supports multi-teacher guidance for improved convergence and sample quality (Feng et al., 20 Mar 2025).
1. Foundational Concepts and Motivation
Rectified Flow (Liu et al. 2022) is a family of continuous-time generative models that learn a transport ODE from a base noise distribution (e.g., Gaussian) to a target data distribution . The goal is to learn a coupling (transport map) such that given . The ODE is parameterized as
where is a neural “drift” network. The model is trained to minimize
constraining the flow’s tangent to match the straight-line difference along interpolated segments. Models can be further refined by applying sequential rectified flows, progressively straightening the induced coupling.
Rectified-flow distillation compresses multiple such rectifications into a single “student” network through supervised regression
allowing direct prediction from to in one pass.
However, iterative rectification leads to error accumulation: the local ODE integration error ( per step) and model approximation error compound as increases, often degrading overall mapping fidelity. This motivates methods for reducing distillation error while retaining fast inference (Feng et al., 20 Mar 2025).
2. Bezier-Curve Guided Distillation Framework
Bezier Distillation addresses error compounding by formulating the target flow as an th-degree Bezier curve in state space, parameterized by a series of control points corresponding to: the initial sample (), one or more intermediate “teacher” distributions (), and the final data sample ().
The general Bezier curve is given by
ensuring smooth, convex-hull-bounded paths between endpoints. The tangent at is —the time derivative of . The control points are generated using teacher rectified flows at specific intermediate times (): where is the rectified flow map at time .
The student network is trained to match the velocity of the Bezier curve: The loss reduces to earlier rectified-flow objectives when , and admits quadratic (one teacher) and cubic (two teachers) specializations detailed below.
Quadratic (Degree-2) Path
- Control points: , ,
- Bezier trajectory:
- Tangent:
- Loss:
Cubic (Degree-3) Path, Multi-Teacher
- Control points: , , ,
- Cubic curve and tangent as in Eqs. (8)-(9) of (Feng et al., 20 Mar 2025) with corresponding multi-teacher loss.
The framework generalizes to arbitrary degree , with teachers and control points at associated times .
3. Multi-Teacher Distillation Design
Bezier Distillation is inherently a multi-teacher knowledge distillation method. Each teacher consists of a (possibly multi-step) rectified flow map producing a distribution . Teacher guidance is realized by providing intermediate couplings, allowing the Bezier student to interpolate along more accurate and smooth paths compared to piecewise straight line or high-step rectified-flow distillation.
The student network is a parameterized vector field . At inference, trajectories are produced by numerically solving the ODE: from () towards (), requiring only a single (or few) function calls for fast sampling.
The objective can incorporate additional regularization or teacher-consistency terms: This provides a direct route for integrating multiple knowledge sources and controlling the tradeoff between teacher fidelity and student generalization.
4. Error Accumulation and Numerical Analysis
Standard rectified-flow distillation is sensitive to numerical errors accrued across repeated ODE solutions. Given integrator step size and order , the per-step discretization error is , and with rectifications, the cumulative deviation scales as : The student’s final error inherits this accumulation in expectation: with model fitting error . As increases (for more accurate straightening), the effect of accumulated error outweighs the benefits of more “rectified” couplings, leading to suboptimal student performance.
Bezier Distillation alleviates this by interpolating through intermediate distributions generated by finite (limited) application of teacher flows, avoiding direct dependence on repeatedly composed, error-prone mappings. This results in a more robust student with reduced total error.
5. Training Procedure and Pseudocode
Training proceeds by constructing batches of Bezier-curve paths through control points generated by teacher flows. At each iteration:
- Sample noise vectors from .
- For each teacher , compute .
- Sample uniformly in , and construct Bezier point with control points , , ..., .
- Compute tangent at .
- Compute network output and loss .
- Update parameters: .
At test time, is integrated from to $1$ starting at . The complete pseudocode is verbatim in (Feng et al., 20 Mar 2025).
6. Reported Empirical Observations and Open Issues
The available draft states that Bezier Distillation outperforms standard rectified-flow distillation with fewer iterations, achieves improved sample quality versus single- or two-step baselines, and exhibits strong performance in image-to-image translation tasks. The manuscript, however, does not specify:
- Benchmark datasets (e.g., ImageNet, CIFAR-10, CelebA).
- Quantitative performance metrics (e.g., FID, IS, PSNR, SSIM).
- Detailed comparative results (baseline scores, number of function calls).
- Ablation over curve degree () and number of teachers.
The mathematical and algorithmic formulation provided facilitates reproducibility and independent benchmarking on standard image synthesis and translation datasets, allowing direct comparison with both classical rectified-flow models and alternative distillation or acceleration approaches.
7. Context and Significance
Bezier Distillation generalizes the distillation paradigm in ODE-based generative modeling by integrating multi-teacher supervision through Bezier-curve interpolation, providing a smoother and more robust framework for compressing deep generative flows. The formulation admits straightforward generalization to arbitrary interpolation paths and arbitrarily many teachers, and can be combined with existing consistency regularizers. A plausible implication is improved efficiency in sample synthesis and accelerated convergence for high-fidelity generative modeling, especially as multi-teacher and geometric guidance techniques gain prominence in diffusion and flow-based generative learning (Feng et al., 20 Mar 2025). Experimental completion and independent evaluation remain open for further confirmation and quantitative assessment.