Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bezier Distillation

Updated 20 March 2026
  • Bezier Distillation is a knowledge distillation framework that uses Bezier-curve interpolation to integrate multi-teacher guidance in flow-based generative models.
  • It replaces straight-line ODE flows with multi-step Bezier trajectories, mitigating error accumulation and improving convergence.
  • The method enables efficient, high-fidelity mappings in tasks like image synthesis by reducing overall distillation error with fewer model iterations.

Bezier Distillation is a knowledge distillation framework for flow-based generative modeling that leverages Bezier-curve interpolation through intermediate “teacher” distributions to mitigate error accumulation in rectified flows. The method extends conventional rectified flow distillation—where straight-line ODE flows between a base distribution and a target are repeatedly composed and then distilled—by replacing the straight-line coupling with multi-step Bezier trajectories anchored by intermediate rectified flows. This approach allows the student model to efficiently acquire more accurate mappings between source and target distributions with reduced cumulative error, and supports multi-teacher guidance for improved convergence and sample quality (Feng et al., 20 Mar 2025).

1. Foundational Concepts and Motivation

Rectified Flow (Liu et al. 2022) is a family of continuous-time generative models that learn a transport ODE from a base noise distribution T0T_0 (e.g., Gaussian) to a target data distribution T1T_1. The goal is to learn a coupling (transport map) T:RdRdT:\mathbb{R}^d \to \mathbb{R}^d such that T(X0)T1T(X_0)\sim T_1 given X0T0X_0\sim T_0. The ODE is parameterized as

dXtdt=v(Xt,t),Xt=tX1+(1t)X0,t[0,1]\frac{dX_t}{dt} = v(X_t, t), \quad X_t = t X_1 + (1-t) X_0, \quad t\in[0,1]

where vv is a neural “drift” network. The model is trained to minimize

minvt=01EX0,X1[X1X0v(Xt,t)2]dt\min_v \int_{t=0}^1 \mathbb{E}_{X_0,X_1}\Big[\|X_1 - X_0 - v(X_t, t)\|^2\Big] dt

constraining the flow’s tangent to match the straight-line difference X1X0X_1-X_0 along interpolated segments. Models can be further refined by applying kk sequential rectified flows, progressively straightening the induced coupling.

Rectified-flow distillation compresses multiple such rectifications into a single “student” network through supervised regression

minvEX0,Xk[XkX0v(X0,0)2]\min_v \mathbb{E}_{X_0, X_k}[\|X_k - X_0 - v(X_0, 0)\|^2]

allowing direct prediction from X0X_0 to XkX_k in one pass.

However, iterative rectification leads to error accumulation: the local ODE integration error (O(hp)O(h^p) per step) and model approximation error compound as kk increases, often degrading overall mapping fidelity. This motivates methods for reducing distillation error while retaining fast inference (Feng et al., 20 Mar 2025).

2. Bezier-Curve Guided Distillation Framework

Bezier Distillation addresses error compounding by formulating the target flow as an nnth-degree Bezier curve in state space, parameterized by a series of control points {P0,...,Pn}\{P_0, ..., P_n\} corresponding to: the initial sample (P0=X0P_0=X_0), one or more intermediate “teacher” distributions (Pi=XτiP_i=X_{\tau_i}), and the final data sample (Pn=X1P_n=X_1).

The general Bezier curve is given by

B(t)=i=0n(ni)(1t)nitiPi,t[0,1]B(t) = \sum_{i=0}^n \binom{n}{i} (1-t)^{n-i} t^i P_i, \quad t\in[0,1]

ensuring smooth, convex-hull-bounded paths between endpoints. The tangent at tt is B˙n(t)\dot{B}_n(t)—the time derivative of B(t)B(t). The control points PiP_i are generated using teacher rectified flows at specific intermediate times (τi\tau_i): Xτi=X0+Uτi(X0,0)X_{\tau_i} = X_0 + U_{\tau_i}(X_0, 0) where UτiU_{\tau_i} is the rectified flow map at time τi\tau_i.

The student network v(,t)v(\cdot, t) is trained to match the velocity of the Bezier curve: minvt=01EX0,Xτ1,...,X1B˙n(t)v(Bn(t),t)2dt\min_v \int_{t=0}^1 \mathbb{E}_{X_0, X_{\tau_1}, ..., X_1} \|\dot{B}_n(t) - v(B_n(t), t)\|^2 dt The loss reduces to earlier rectified-flow objectives when n=1n=1, and admits quadratic (one teacher) and cubic (two teachers) specializations detailed below.

Quadratic (Degree-2) Path

  • Control points: P0=X0P_0=X_0, P1=XτP_1=X_\tau, P2=X1P_2=X_1
  • Bezier trajectory: B2(t)=(1t)2X0+2t(1t)Xτ+t2X1B_2(t) = (1-t)^2X_0 + 2t(1-t)X_\tau + t^2X_1
  • Tangent: B˙2(t)=2[t(X1X0)+(12t)Uτ(X0,0)]\dot{B}_2(t) = 2\big[t(X_1-X_0) + (1-2t)U_\tau(X_0,0)\big]
  • Loss: minv01E[t(X1X0)+(12t)Uτ(X0,0)v(B2(t),t)2]dt\min_v\int_0^1 \mathbb{E}\left[\left\|t(X_1-X_0)+(1-2t)U_\tau(X_0,0)-v(B_2(t),t)\right\|^2\right]dt

Cubic (Degree-3) Path, Multi-Teacher

  • Control points: P0=X0P_0=X_0, P1=XτP_1=X_\tau, P2=XτP_2=X_{\tau'}, P3=X1P_3=X_1
  • Cubic curve and tangent as in Eqs. (8)-(9) of (Feng et al., 20 Mar 2025) with corresponding multi-teacher loss.

The framework generalizes to arbitrary degree nn, with teachers and control points at associated times τ1,...,τn1\tau_1,...,\tau_{n-1}.

3. Multi-Teacher Distillation Design

Bezier Distillation is inherently a multi-teacher knowledge distillation method. Each teacher consists of a (possibly multi-step) rectified flow map UτiU_{\tau_i} producing a distribution TτiT_{\tau_i}. Teacher guidance is realized by providing intermediate couplings, allowing the Bezier student to interpolate along more accurate and smooth paths compared to piecewise straight line or high-step rectified-flow distillation.

The student network is a parameterized vector field vθ:Rd×[0,1]Rdv_\theta:\mathbb{R}^d\times[0,1]\rightarrow\mathbb{R}^d. At inference, trajectories are produced by numerically solving the ODE: dXtdt=vθ(Xt,t)\frac{d X_t}{dt} = v_\theta(X_t, t) from t=0t=0 (X0X_0) towards t=1t=1 (X1X_1), requiring only a single (or few) function calls for fast sampling.

The objective can incorporate additional regularization or teacher-consistency terms: L(θ)=...+λiE[v(Xτi,τi)Uτi(X0,0)2]L(\theta) = ... + \lambda\sum_i\mathbb{E}\left[\|v(X_{\tau_i}, \tau_i) - U_{\tau_i}(X_0, 0)\|^2\right] This provides a direct route for integrating multiple knowledge sources and controlling the tradeoff between teacher fidelity and student generalization.

4. Error Accumulation and Numerical Analysis

Standard rectified-flow distillation is sensitive to numerical errors accrued across repeated ODE solutions. Given integrator step size hh and order pp, the per-step discretization error is O(hp)O(h^p), and with kk rectifications, the cumulative deviation scales as O(khp)O(k h^p): X~kX1i=1kϵi=O(khp)\|\widetilde{X}_k - X_1\| \approx \sum_{i=1}^{k}\|\epsilon_i\| = O(k h^p) The student’s final error inherits this accumulation in expectation: E[T^(X0)X12]C1k2h2p+C2δmodel2\mathbb{E}\big[\|\widehat{T}(X_0) - X_1\|^2\big] \geq C_1 k^2 h^{2p} + C_2 \delta_{\rm model}^2 with model fitting error δmodel\delta_{\rm model}. As kk increases (for more accurate straightening), the effect of accumulated error outweighs the benefits of more “rectified” couplings, leading to suboptimal student performance.

Bezier Distillation alleviates this by interpolating through intermediate distributions generated by finite (limited) application of teacher flows, avoiding direct dependence on repeatedly composed, error-prone mappings. This results in a more robust student with reduced total error.

5. Training Procedure and Pseudocode

Training proceeds by constructing batches of Bezier-curve paths through control points generated by teacher flows. At each iteration:

  1. Sample noise vectors x0bx_0^b from T0T_0.
  2. For each teacher UτiU_{\tau_i}, compute xτib=x0b+Uτi(x0b,0)x_{\tau_i}^b = x_0^b + U_{\tau_i}(x_0^b, 0).
  3. Sample tt uniformly in [0,1][0,1], and construct Bezier point BbB^b with control points x0bx_0^b, xτ1bx_{\tau_1}^b, ..., x1bx_1^b.
  4. Compute tangent B˙b\dot{B}^b at tt.
  5. Compute network output vb=vθ(Bb,t)v^b = v_\theta(B^b, t) and loss L=(1/B)bB˙bvb2L = (1/B) \sum_b \|\dot{B}^b - v^b\|^2.
  6. Update parameters: θθηθL\theta \leftarrow \theta - \eta \nabla_\theta L.

At test time, X˙=vθ(X,t)\dot{X} = v_\theta(X, t) is integrated from t=0t=0 to $1$ starting at X0X_0. The complete pseudocode is verbatim in (Feng et al., 20 Mar 2025).

6. Reported Empirical Observations and Open Issues

The available draft states that Bezier Distillation outperforms standard rectified-flow distillation with fewer iterations, achieves improved sample quality versus single- or two-step baselines, and exhibits strong performance in image-to-image translation tasks. The manuscript, however, does not specify:

  • Benchmark datasets (e.g., ImageNet, CIFAR-10, CelebA).
  • Quantitative performance metrics (e.g., FID, IS, PSNR, SSIM).
  • Detailed comparative results (baseline scores, number of function calls).
  • Ablation over curve degree (nn) and number of teachers.

The mathematical and algorithmic formulation provided facilitates reproducibility and independent benchmarking on standard image synthesis and translation datasets, allowing direct comparison with both classical rectified-flow models and alternative distillation or acceleration approaches.

7. Context and Significance

Bezier Distillation generalizes the distillation paradigm in ODE-based generative modeling by integrating multi-teacher supervision through Bezier-curve interpolation, providing a smoother and more robust framework for compressing deep generative flows. The formulation admits straightforward generalization to arbitrary interpolation paths and arbitrarily many teachers, and can be combined with existing consistency regularizers. A plausible implication is improved efficiency in sample synthesis and accelerated convergence for high-fidelity generative modeling, especially as multi-teacher and geometric guidance techniques gain prominence in diffusion and flow-based generative learning (Feng et al., 20 Mar 2025). Experimental completion and independent evaluation remain open for further confirmation and quantitative assessment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
1.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bezier Distillation.