Papers
Topics
Authors
Recent
Search
2000 character limit reached

SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing

Published 2 Apr 2026 in cs.CV | (2604.01715v1)

Abstract: Recent advances in flow-based generative models have enabled training-free, text-guided image editing by inverting an image into its latent noise and regenerating it under a new target conditional guidance. However, existing methods struggle to preserve source fidelity: higher-order solvers incur additional model inferences, truncated inversion constrains editability, and feature injection methods lack architectural transferability. To address these limitations, we propose SteerFlow, a model-agnostic editing framework with strong theoretical guarantees on source fidelity. In the forward process, we introduce an Amortized Fixed-Point Solver that implicitly straightens the forward trajectory by enforcing velocity consistency across consecutive timesteps, yielding a high-fidelity inverted latent. In the backward process, we introduce Trajectory Interpolation, which adaptively blends target-editing and source-reconstruction velocities to keep the editing trajectory anchored to the source. To further improve background preservation, we introduce an Adaptive Masking mechanism that spatially constrains the editing signal with concept-guided segmentation and source-target velocity differences. Extensive experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium demonstrate that SteerFlow consistently achieves better editing quality than existing methods. Finally, we show that SteerFlow extends naturally to a complex multi-turn editing paradigm without accumulating drift.

Summary

  • The paper introduces an Amortized Fixed-Point (AFP) solver to significantly reduce structural drift by enforcing velocity consistency during inversion.
  • It presents a novel trajectory interpolation method that blends source and target velocities to guarantee spatial and semantic alignment throughout the editing process.
  • An adaptive masking strategy is integrated to localize edits effectively, ensuring robust preservation of source content while enabling complex transformations.

SteerFlow: Advancing Faithful Inversion-Based Image Editing with Rectified Flows

Introduction and Motivation

Recent advances in flow-based generative models, particularly Rectified Flow (RF), have significantly enhanced the fidelity and controllability of text-guided image synthesis. Inversion-based editing methods for these models enable the mapping of a source image into latent space, followed by controlled regeneration conditioned on a new target prompt. Despite progress, current methods exhibit critical shortcomings in source fidelity, including structural drift (over-editing or under-editing), poor edit localization, and limited adaptability across architectures due to heuristic constraints or computational overheads (e.g., higher-order ODE solvers, feature injection, truncated inversion).

SteerFlow (2604.01715) comprehensively analyzes the theoretical and practical limitations of existing inversion-based RF editing approaches, addressing them through a unified, architecture-agnostic paradigm. The framework combines an Amortized Fixed-Point (AFP) solver, a theoretically grounded trajectory-interpolation mechanism for the backward process, and an adaptive masking pipeline, jointly providing provable guarantees on structural and semantic alignment. Figure 1

Figure 1: SteerFlow compared with established methods: Unconstrained generation induces severe over-editing and drift, while prior approaches (e.g., UniEdit) yield under-editing or are architecture-dependent; SteerFlow achieves a stringent balance of editability and source preservation.

Methodological Contributions

Amortized Fixed-Point Solver for Inversion

The core objective for image editing is high-invertibility—finding a latent that, upon reverse integration under the source condition, yields an exact reconstruction. In standard discretizations (e.g., Euler or even higher-order solvers), local truncation errors at each step result in compounded trajectory misalignment between the forward and backward ODE integrations, translating to structural drift in edit results.

The AFP solver, central to SteerFlow, enforces velocity consistency across consecutive timesteps by solving a fixed-point equation on the velocity field at each step, initialized with the Euler velocity and refined iteratively via contraction mappings. Importantly, iteration amortization is used: multiple refinement steps are employed only at the initial timestep, while single steps with warm-started velocities suffice for subsequent timesteps. This approach efficiently suppresses the forward-backward velocity mismatch with sublinear computational overhead relative to standard or higher-order fixed-point schemes. Figure 2

Figure 2: Progressive improvement in image reconstruction using increasing AFP solver iterations, visually confirming the mitigation of inversion error.

Trajectory Interpolation and Source Anchoring for Backward Process

A major theoretical result is that even perfect inversion cannot guarantee editing locality if the editing trajectory diverges in latent space due to the altered target condition or high classifier-free guidance (CFG) weights. Previous works apply truncated inversion or feature injection heuristically, but these are brittle and lack theoretical assurances.

SteerFlow introduces Trajectory Interpolation, which adaptively interpolates between the cached source-reconstruction and target-editing velocities at each ODE step. The interpolation coefficient αt\alpha_t is computed as the product of the spatial cosine similarity between the target and source velocities and a smoothly decaying schedule, ensuring structural safety in early steps and gradually allowing larger semantic deviations. This principled blend yields a strict upper bound (provably tighter than Euler) on the total editing error, parameterized by αt\alpha_t.

Adaptive Masking for Locality Control

Generic generative models insufficiently disentangle edit regions from context or background, causing editing leakage. SteerFlow leverages concept-region segmentation via SAM3 to provide a base spatial mask, which is then refined at each step using magnitude and structure of the instantaneous velocity difference between target and source. This adaptive mask is fused with the interpolation mechanism, enforcing both temporal (via αt\alpha_t) and spatial (via mask) gating of velocity deviations. This pipeline robustly localizes edits while accommodating necessary structural changes for complex transformations. Figure 3

Figure 3: Comparative visualization of masking strategies. Adaptive Masking endows SteerFlow with robust editability without background leakage, in contrast to static and no-mask baselines.

Experimental Results

SteerFlow is rigorously evaluated on FLUX.1-dev and Stable Diffusion 3.5-Medium, benchmarked against both inversion-based (e.g., UniEdit, FireFlow, RF-Solver) and inversion-free (e.g., FlowEdit, FlowAlign) baselines on PIE-Bench across multiple quality, fidelity, and preservation metrics.

Key findings:

  • On FLUX.1-dev, SteerFlow achieves leading rank in source preservation (DinoDist, LPIPS, SSIM) while maintaining competitive editability (CLIP, ImageReward).
  • On Stable Diffusion 3.5-Medium, SteerFlow is the only method to consistently rank in the top two across both source fidelity and editability, regardless of architecture.
  • The ablation between SteerFlow with and without adaptive masking confirms the decisive contribution of spatial locality mechanisms. Figure 4

    Figure 4: Qualitative results for various edit types, highlighting both fine-grained attribute change and robust structural consistency compared to competitive baselines.

    Figure 5

    Figure 5: On Stable Diffusion 3.5-Medium, SteerFlow achieves faithful target alignment and superior pose preservation where state-of-the-art editing baselines struggle.

Multi-Turn Editing

SteerFlow generalizes seamlessly to sequential, compositional multi-turn editing. By incrementally anchoring each turn’s source to the previous turn’s edited trajectory and updating the base mask as needed, it arrests the cumulative error commonly observed in other workflows. Figure 6

Figure 6: Multi-turn image editing pipeline—object replacement, attribute modification, and object addition—demonstrates SteerFlow’s robustness against drift across successive edits.

Figure 7

Figure 7: Diverse, consistent outcomes in compositional multi-turn editing scenarios on both FLUX.1-dev and Stable Diffusion 3.5-Medium.

Theoretical Analysis and Ablation

SteerFlow derives strict upper bounds on inversion and editing errors under Lipschitz and curvature assumptions for the underlying velocity field. The analytical framework elucidates the suboptimality of naive higher-order solvers and motivates the design of amortized iteration and adaptive interpolation.

Detailed ablation demonstrates:

  • The AFP solver achieves monotonic improvement in inversion fidelity as warm-started iteration count increases, unlike naive fixed-point solvers that often exhibit degraded performance for large KK due to moving target effects.
  • The trajectory interpolation decay exponent γ\gamma and CFG coefficient ww offer tunable control over the editability–preservation trade-off, with stronger source anchoring yielding smaller drift but reduced edit magnitude.

Limitations and Future Directions

SteerFlow’s adaptive masking, while effective, may inhibit extensive structural transformations such as large pose changes or background alterations. Tuning of mask adaptation hyperparameters—and, to a lesser extent, decay and guidance coefficients—remains necessary to optimize for edit type and model. Future developments could integrate learned spatial adaptation policies or richer region decompositions, as well as improving disentanglement in underlying flow models for finer-grained control.

Conclusion

SteerFlow presents a formally principled, highly effective framework for inversion-based image editing in rectified flow models, providing theoretical guarantees on source preservation while attaining strong semantic alignment and spatial locality. The methodology’s architecture-agnostic design, superior empirical performance, and applicability to complex multi-turn scenarios establish SteerFlow as a reference point for future controlled editing systems in generative modeling. Figure 8

Figure 8: SteerFlow editing framework: Amortized fixed-point inversion, trajectory-interpolating backward editing, and adaptive masking are synergistically integrated for robust, faithful image editing.

References

  • Dao, T., Wang, Z., Pham, K.T., Chen, L. "SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing" (2604.01715)
  • Related works including UniEdit [jiao2026unieditflow], FlowEdit [kulikov2025flowedit], FireFlow [deng2024fireflowfastinversionrectified], and SAM3 [carion2025sam3segmentconcepts].

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.