Neural ODE & Rectified Flow Policies

Updated 22 April 2026

Neural ODE/Rectified Flow Policies are continuous-time generative models that parameterize action distributions using ODE flows, ensuring stability and expressivity.
They employ flow-matching objectives and optimal transport principles to train vector fields for linear trajectory interpolation and gradient stabilization.
These methods enable one-step inference with guarantees like gradient norm preservation, proving effective for reinforcement learning, optimal control, and imitation learning.

Neural ordinary differential equation (Neural ODE) and rectified flow policies constitute a family of continuous-time generative policy architectures that leverage ODE-based flow models to represent highly expressive and stable action distributions for reinforcement learning, optimal control, and imitation learning. These methods utilize neural networks to parameterize time-dependent vector fields or parameter flows, and employ flow-matching or transport-inspired objectives to enable efficient and robust policy optimization, often under significant computational and stability constraints. Rectified flow policies, in particular, enforce structural properties on the flow field (e.g., constant velocity or orthogonality), enabling both theoretical guarantees (e.g., gradient norm preservation) and practical gains such as one-step inference. Recent research unifies these approaches under optimal transport, policy-gradient, and control-theoretic frameworks, yielding state-of-the-art results across a spectrum of robotic and sequential decision-making benchmarks.

1. Mathematical Foundations: Neural ODE and Rectified Flow Architectures

Neural ODE policies define a generative process for actions (or trajectories) as continuous-time flows driven by a neural network:

$\frac{dx(t)}{dt} = v_\theta(x(t), t, s),$

where $x(0)\sim\mathcal{N}(0, I)$ is initial noise and $s$ is the state or observation. The action is typically $a = h(x(1))$ where $h$ may squash to the feasible action space (Zhong et al., 3 Feb 2026, Gao et al., 18 Mar 2026).

Rectified flow policies enforce the flow to follow a straight-line (optimal transport) interpolation between source and target:

$x_t = (1-t)x_0 + t x_1 \quad\Rightarrow\quad \frac{dx_t}{dt} = x_1 - x_0,$

so that the vector field is trained to match $v_\theta(x_t, t, s) \approx x_1 - x_0$ (Liu, 2022, Sochopoulos et al., 2 May 2025, Zhou et al., 10 Apr 2026). This construction guarantees that trajectories are linear in latent space (rectified/stabilized), directly connecting to optimal transport theory and preserving certain marginal constraints by design.

Advanced frameworks use nested flows or parameter flows on Lie groups, as in ODEtoODE, where the time-varying parameters $W_t$ themselves evolve according to an ODE on the compact orthogonal group $\mathcal{O}(d)$ :

$\begin{cases} \dot x(t) = f(W_t x(t)), \ \dot W_t = W_t b_\psi(t,W_t), \qquad b_\psi(t, W_t) \in \mathfrak{so}(d). \end{cases}$

This structure enforces orthonormality and norm preservation for all $x(0)\sim\mathcal{N}(0, I)$ 0, addressing gradient instability (Choromanski et al., 2020).

2. Flow-Matching Objectives, Optimal Transport, and Policy Training

Flow-matching objectives constitute the core training paradigm for most flow-based policies. The basic unsupervised loss is

$x(0)\sim\mathcal{N}(0, I)$ 1

This is equivalent to the regression-based rectified flow algorithm in optimal transport, guaranteeing marginal preservation and monotonic decrease of transport costs (Liu, 2022). Recent extensions introduce conditional optimal transport couplings to enforce matching trajectories under side information (e.g., observations, point clouds), using OT plans at mini-batch level to pair $x(0)\sim\mathcal{N}(0, I)$ 2 (Sochopoulos et al., 2 May 2025, Zhang et al., 2024).

For RL, advantage-weighted or Q-value-weighted flow matching targets replace supervised endpoints with learned high-value samples. This yields a loss

$x(0)\sim\mathcal{N}(0, I)$ 3

with $x(0)\sim\mathcal{N}(0, I)$ 4 the estimated advantage (Gao et al., 18 Mar 2026).

Entropy-regularized versions introduce analytic or surrogate loss terms for the entropy $x(0)\sim\mathcal{N}(0, I)$ 5, capitalizing on the tractable divergence of ODE flows (Gao et al., 18 Mar 2026, Zhou et al., 10 Apr 2026).

3. Structural Regularization: Rectification, Orthogonality, and Stability

Several frameworks enforce additional structural constraints on the parameterization or dynamics to guarantee stability and mitigate the vanishing/exploding gradient problem:

Orthogonal flows (ODEtoODE): Constraining the parameter flow $x(0)\sim\mathcal{N}(0, I)$ 6 maintains isometry, ensuring that the Jacobian of the mapping preserves norms throughout the entire trajectory. The resulting gradient-stabilization theorem guarantees

$x(0)\sim\mathcal{N}(0, I)$ 7

with constants independent of network depth (Choromanski et al., 2020).

Truncated rectified flows: Hybrid architectures combine a deterministic rectified prefix (ODE) with a stochastic SDE tail, facilitating entropy-regularized optimization and enabling stable one-step sampling. Gradient truncation is used to avoid backpropagation through the full chain (Zhou et al., 10 Apr 2026).
Consistency flow matching: Enforces self-consistency of the velocity field over time, achieving straight-line, constant-velocity flows for one-step inference. This is combined with point cloud or visual conditioning for real-world visuomotor policy deployment (Zhang et al., 2024).

4. Policy-Gradient, Reparameterization, and Online RL Integration

Neural ODE/rectified flow policies are compatible with both likelihood-based policy gradient and reparameterization gradient approaches:

Policy-gradient with flow likelihoods: Works such as ReinFlow inject learnable Gaussian noise into the flow path, converting deterministic ODEs into discrete-time Markov processes and enabling exact, tractable log-likelihoods for standard PPO-style policy gradients, even under few- or one-step policies (Zhang et al., 28 May 2025).
Reparameterization gradient methods: In fully differentiable settings, policies are optimized by backpropagation through the flow generation process and environment simulator, achieving sample-efficient gradient estimates without requiring explicit log-likelihoods (Zhong et al., 3 Feb 2026).
Wasserstein-constrained actor-critic: Actor optimization is regularized to align the flow with high-Q buffer policies via Wasserstein-2 distance penalties, with velocity field matching as a tractable upper bound (Lv et al., 15 Jun 2025).
Entropy-regularized online RL: Flow policies incorporate explicit entropy estimation via ODE divergence, directly enabling maximum-entropy RL objectives without the need for indirect entropy control (Gao et al., 18 Mar 2026).

5. Practical Implementations and One-Step Policy Inference

The principal computational bottleneck for flow policy inference is the need for iterative ODE (or SDE) integration. Rectified flow policies, conditional OT coupling, and consistency matching architectures enable drastic reductions in action sampling complexity:

Method	Typical # Forward Evals	Success Rate/Return	Inference Time
Diffusion (DDIM/DP3/ET-SEED)	10–100	68–70%	63–145 ms
Flow/Rectified ODE w/ OT	1–2	66–70%	20 ms
ReSeFlow (SE(3) equivariant)	1	Up to −48.5% error	1–2× step eval
TRFP, FMER, FlowRL	1–4	SOTA RL returns	<30 ms (GPU)

This enables real-time policy deployment, long-horizon planning, and sample-efficient control, including for high-dimensional visual and geometric input settings (Zhang et al., 2024, Wang et al., 20 Sep 2025, Zhou et al., 10 Apr 2026).

6. Theoretical Guarantees and Empirical Performance

Neural ODE/rectified flow policies inherit strong theoretical properties from their underlying optimal transport and geometric constructions:

Marginal preservation: Training guarantees that the mappings respect source and target distribution constraints at each step (Liu, 2022).
Monotonic cost decrease: Rectified flows realize an interior-point-style descent in convex transport cost for any cost function $x(0)\sim\mathcal{N}(0, I)$ 8 (Liu, 2022).
Gradient norm preservation: Orthogonal parameter flows eliminate depth-dependent gradient pathologies (Choromanski et al., 2020).
Empirical SOTA: Across benchmarks (MuJoCo, Gym, FrankaKitchen, HumanoidBench), flow and rectified flow policies match or exceed diffusion- and Gaussian-based baselines, especially under one/few-step inference constraints (Lv et al., 15 Jun 2025, Zhou et al., 10 Apr 2026, Gao et al., 18 Mar 2026, Sochopoulos et al., 2 May 2025).

Ablation studies confirm the critical role of flow straightening, entropy regularization, and Q-guided selection in maintaining expressivity, performance, and efficient exploration (Zhou et al., 10 Apr 2026, Gao et al., 18 Mar 2026).

7. Extensions, Specializations, and Open Directions

Recent work extends rectified flow policies to:

Lie group (SE(3)) action spaces: SE(3)-equivariant networks enable trajectory-level policy learning robust to rotations and translations, with one-step geodesic transport matching complex manipulation trajectories (Wang et al., 20 Sep 2025).
Contrastive policy optimization: Deterministic ODE sampling is used for preference alignment via contrastive objectives, sidestepping the need for SDE-based policies and enabling higher-order ODE solvers (He et al., 21 Nov 2025).
Gradient rectification for high-order solvers: Correction filters (e.g., for Leapfrog integrators) project out spurious oscillatory adjoint modes from auto-diff, restoring training stability and consistency (Xu et al., 2023).

Areas for further development include adaptive ODE solver integration, continuous-time entropy estimation, generalization to hybrid stochastic-deterministic flows, and broader applications in real-world robot control and high-dimensional decision spaces.

References

(Choromanski et al., 2020, Liu, 2022, Sochopoulos et al., 2 May 2025, Lv et al., 15 Jun 2025, Wang et al., 20 Sep 2025, Zhang et al., 2024, Zhang et al., 28 May 2025, Sandoval et al., 2022, Zhou et al., 10 Apr 2026, Gao et al., 18 Mar 2026, Zhong et al., 3 Feb 2026, He et al., 21 Nov 2025, Xu et al., 2023)