Papers
Topics
Authors
Recent
Search
2000 character limit reached

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

Published 18 Jun 2026 in cs.RO and cs.AI | (2606.20135v1)

Abstract: Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (FAFM), which outputs continuous, temporally consistent actions. To handle heterogeneous frequency input, we transform discrete action sequences into the frequency domain with the discrete cosine transform (DCT), perform flow matching over the resulting coefficients, and reconstruct continuous actions via cosine basis expansion. To generate temporally consistent actions, we regularize the first-order temporal derivative to promote smooth actions. This corresponds to a Sobolev-type constraint that suppresses high-frequency errors and discourages abrupt action changes. Our FAFM is simple, introduces no additional network parameters and applies to standalone flow-matching policies and vision-language action models. Across synthetic toy benchmark, obstacle avoidance, LapGym, and LIBERO, FAFM improves success rates, multimodal expressivity, motion smoothness, convergence speed, robustness to mechanical bias and mixed-frequency input. These gains are consistent when deployed on a real-world Franka robot. Code available at https://anonymous.4open.science/r/FAFM.

Summary

  • The paper introduces FAFM, employing DCT-based frequency-domain parameterization to resolve gradient conflicts and enable continuous action generation.
  • It integrates first-order temporal derivative regularization to enforce trajectory smoothness and mitigate high-frequency jitter in robotic tasks.
  • FAFM achieves superior robustness and high success rates under heterogeneous frequency inputs, outperforming baseline methods in diverse manipulation benchmarks.

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

Motivation and Problem Formulation

The paper introduces Frequency-Aware Flow Matching (FAFM) as a robust solution to action generation in robotic manipulation tasks, particularly targeting limitations inherent in existing flow-matching and diffusion policy paradigms. Discretization of action chunks in the time domain leads to two primary failures: (1) identifiability failure in training when combining demonstrations with heterogeneous control frequencies, resulting in conflicting gradients and physically infeasible target actions, and (2) lack of temporal dynamics constraints in inference, causing high-frequency jitter and degraded motion smoothness, especially problematic in soft-body manipulation scenarios. These challenges undermine both the multimodal expressivity and the temporal consistency required for reliable robotic action.

Methodological Innovations

FAFM leverages discrete cosine transform (DCT)-based frequency-domain parameterization to transform discrete action trajectories into frequency coefficient space. This allows the model to operate independently of the control frequency used at data collection, resolving gradient conflicts encountered during training from mixed-frequency demonstrations. Actions are reconstructed via cosine basis expansion, enabling continuous trajectory generation at arbitrary time resolutions.

To enforce smoothness, FAFM introduces first-order temporal derivative regularization, corresponding to an H1 Sobolev objective. This penalizes high-frequency errors quadratically, suppressing abrupt action changes and directly mitigating high-frequency artifacts such as jitter. The analytic derivative supervision offered by DCT coefficients provides an exact, noise-free signal, as opposed to conventional finite-difference approximations. The resulting loss function is the weighted H1-projection error, which acts as a spectral preconditioner, accelerating convergence for physically smooth trajectories.

FAFM is architecturally agnostic, requiring no additional network parameters, and can be incorporated as an action head in both standalone flow-matching policies and vision-language-action (VLA) models.

Empirical Analysis

Synthetic Toy Benchmark

FAFM uniquely solves the learning of smooth, multimodal actions by cleanly separating crossing sinusoidal modes while providing smooth trajectories. Competing methods either exhibit mode-mixing, over-smoothing, or produce low-amplitude trajectories, revealing a failure in capturing multimodality or temporal consistency.

Obstacle Avoidance

In environments with diverse valid paths, FAFM simultaneously achieves high success rates, superior motion smoothness, and enhanced solution diversity. Baselines such as SFP and MPD fail in representing multimodal path diversity despite smoothness, while FM, DP, and FreqPolicy demonstrate high solution diversity but at the cost of temporal consistency and increased jerkiness.

LapGym–Surgical Manipulation

FAFM exhibits superior performance on all metrics: success rate, motion smoothness (measured via log dimensionless jerk, LDLJ), and convergence speed across tasks involving rope threading, grasp-lift-touch, bimanual tissue manipulation, and loop ligating. The method outperforms both basis-function and latent-space parameterized diffusion policies, notably with no need for hand-crafted priors.

Real-World Deployment

On a Franka robot, FAFM achieves 100% success rate for pick-and-place while significantly improving motion smoothness compared to baselines. Jitter-induced task failures and robustness issues observed in baselines highlight FAFM’s practical reliability in real hardware.

VLA Models and Mixed-Frequency Data

FAFM, implemented as action head for To and T0.5 VLAs, maintains high motion smoothness and robustness against mechanical bias. Critically, experiments with mixed-frequency data show that baseline models collapse to zero success rate, while FAFM retains >90% success rate, confirming its invariance to heterogeneous frequency input. Moreover, under scenarios of constant mechanical bias, only the DC coefficient is perturbed, leaving shape-defining coefficients unchanged, furthering robustness.

Theoretical Implications

FAFM formalizes action generation within a continuous Sobolev-regular output space, optimizing a loss function that strictly penalizes high-frequency error modes. The DCT-based parameterization provides uniform correction efficiency across dynamic orders, from velocity to higher derivatives, and acts as an in-built spectral preconditioner for convergence acceleration. The frequency-domain separation enables decoupling of spatial and temporal content, ensuring physical plausibility of recovered trajectories across demonstration frequencies.

A formal efficiency analysis (Appendix C) demonstrates O(j2) improvement in correction efficiency over standard chunk-space flow matching at every derivative order. The analytic velocity supervision is exact within the retained frequency subspace and immune to noise. FAFM is theoretically robust to mechanical bias, with only the mean component perturbed, while shape-defining coefficients remain stable.

Practical Implications and Future Directions

FAFM advances practical deployment of large-scale robot foundation models by guaranteeing cross-platform training from heterogeneous datasets without manual frequency homogenization or hand-crafted priors. The method enhances reliability in soft-body manipulation and precision tasks, potentially expanding the scope of robotic surgery, fine-grained assembly, or deformable object handling.

Further research could explore extension to domains requiring high-frequency or impulsive actions, where DCT regularity may not suffice, e.g., percussive or ballistic interactions. Additionally, integration of higher-order dynamics via further Sobolev-type regularization or alternative spectral expansions may enrich the expressivity for broader manipulation tasks. The frequency-domain approach offers a promising route for generalization, data efficiency, and robust policy training in real-world robotics.

Conclusion

FAFM establishes a mathematically principled, frequency-aware flow-matching paradigm for continuous and temporally consistent robotic action generation (2606.20135). By conducting flow matching in frequency space and regularizing temporal derivatives, FAFM resolves gradient conflicts from mixed-frequency data, suppresses high-frequency artifacts, and accelerates convergence. Empirical results across simulation, benchmarks, and real robots demonstrate substantial gains in success rate, smoothness, expressivity, and robustness. The theoretical structure positions FAFM as a scalable foundation for next-generation multimodal robotic manipulation, with avenues for future expansion into tasks requiring complex dynamic behaviors.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.