GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models

Published 27 Apr 2026 in cs.LG | (2604.24238v1)

Abstract: Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead edit near the data manifold, where small local updates can replace repeated re-synthesis. To enable this, we estimate a local manifold tangent space directly from perturbed samples and prove that this sample-based estimator closely approximates the true tangent. Building on this guarantee, we devise a Jacobian-free algorithm that constructs a tangent frame via small perturbations to the initial noise and alternates small tangent moves with diffusion-based projections. Updates within this frame follow principled on-manifold directions while suppressing off-manifold drift, enabling fine-grained edits without full re-diffusion or additional training. Edit strength is controlled by the number of steps for rapid, continuous adjustments that preserve fidelity and plug into existing samplers. Empirically, the resulting tangent directions yield smooth, semantic unsupervised traversals and effective CLIP-guided optimization, demonstrating practical interactive continuous editing.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents GeoEdit’s novel method for training-free on-manifold editing by estimating local tangent space via PCA of perturbed noise vectors.
It enables fine-grained, monotonic edits with real-time control, reducing the need for costly re-diffusion and complex Jacobian computations.
Empirical results on datasets like CelebA-HQ and Stable Diffusion show competitive structure preservation and semantic alignment with lower computational latency.

GeoEdit: Training-Free On-Manifold Editing in Diffusion Models

Problem Statement and Motivation

High-fidelity image synthesis via diffusion models has enabled scalable image generation and semantically rich creative manipulation. Training-free editing is particularly attractive as it avoids retraining and allows targeted, real-time modifications at inference. However, existing training-free methods typically require re-running the full denoising trajectory for each edit strength, resulting in high computational costs and latency. Furthermore, edit strength is commonly entangled with the stochastic sampling path, leading to non-monotonic responses and loss of fine control, especially when edits are performed from early timesteps or with new guidance schedules.

The GeoEdit framework addresses these deficiencies by leveraging the geometric structure of data manifolds captured by diffusion models. It introduces an algorithm capable of fast, continuous, training-free on-manifold editing. GeoEdit estimates the local tangent space near a generated sample, enabling edits that remain close to the data manifold and preserve semantic content without re-diffusing from noise or requiring expensive Jacobian evaluations.

Theory: Tangent Space Estimation Without Jacobians

GeoEdit operates under the manifold hypothesis, where the data distribution is supported on or near a low-dimensional, smooth manifold. It assumes the denoising process of diffusion models moves samples onto this manifold, concentrating probability mass in a tubular neighborhood.

For editing, tangent directions must be computed to steer updates along the manifold. Conventional approaches estimate these tangents via local linearization of the denoiser or score network, which involves costly Jacobian or Jacobian–vector product evaluations, particularly with large U-Nets and at high resolution. GeoEdit circumvents these requirements by proposing a theoretically supported, sample-based tangent-space estimator. The procedure consists of generating an ensemble of perturbed noise vectors, mapping these through the model's probability flow, and applying PCA to their secants to obtain an orthonormal tangent frame.

GeoEdit provides a non-asymptotic bound on the deviation between the true tangent space and the empirical subspace spanned by $k$ ambient secants, with error quantifiable in terms of tube radius, curvature, and perturbation scale (Theorem). This result formally justifies using finite ambient secants as proxies for the intrinsic tangent space.

Figure 1: Comparison of tangent lines estimated by Jacobian-based and sample-based approaches, with sample-based GeoEdit tangents more closely aligning with ground truth.

Figure 1: Empirical rank ratio on CIFAR-10; the intrinsic dimension decreases along denoising, indicating low-rank manifold structure near the generation endpoint.

Algorithm: On-Manifold Editing Pipeline

GeoEdit integrates seamlessly into guided diffusion samplers. Its editing pipeline comprises three main steps:

Local Tangent Estimation: Small perturbations in latent space are mapped near the manifold, secants are computed, and PCA yields the local tangent frame.
Frame Transport: Editing proceeds by projecting desired update directions onto the tangent space, transporting the ensemble along these directions without explicit Jacobian computation.
Manifold Retraction: After each move, a short noising-denoising operation retracts the ensemble and sample back into the tube, preventing off-manifold drift.

Edit strength is controlled via step count rather than stochastic path manipulation, enabling rapid, continuous, and interactive edits.

Empirical Evaluation

GeoEdit is deployed on multiple real-image diffusion models, including CelebA-HQ, LSUN-church, and Stable Diffusion. Unsupervised edits along estimated tangent directions yield smooth, semantic traversals. The method demonstrates:

Fine-grained, monotonic morphs: Traversal along principal tangent directions induces interpretable transformations (e.g., pose, illumination, background), distinct for each basis vector.

Figure 2: Editing along the principal direction $\widehat{u}_1$ in the estimated tangent space.

CLIP-guided optimization: The edit algorithm supports projected gradient ascent for text-based guidance. CLIP cosine similarity serves as the differentiable objective, with updates confined to the tangent space.

Figure 3: CLIP-guided editing for prompt "a male" on CelebA-HQ, showing semantic alignment and preservation of identity.

Edits in latent diffusion models: GeoEdit operates effectively within high-dimensional latent spaces (e.g., Stable Diffusion), capturing meaningful local semantic directions.

Figure 4: Continuous edits along the first principal direction in SDXL-Lightning; the top row shows the change for prompt "cat", the bottom row for "dragon".

Ablation Studies

GeoEdit's robustness and design choices are characterized through systematic ablations:

Tangent projection: Disabling tangent projection reduces semantic fidelity, as update directions drift off the manifold.
Edit step size: Large step sizes eventually exceed the correction radius of noising-denoising retraction, failing to return samples to the data manifold.
Local subspace dimension: The edit quality is stable across different PCA dimensions $k$ , although the estimated principal direction may shift.

Figure 5: CLIP-guided editing without GeoEdit—minimal change and semantic misalignment due to absence of tangent projection.

Quantitative Analysis

GeoEdit achieves competitive metrics on CelebA-HQ, matching or exceeding the state-of-the-art in structure preservation (SSIM) and perceptual similarity (LPIPS), while incurring only a minor increase in FID. It also demonstrates high computational efficiency, with per-edit runtime an order of magnitude lower than comparable methods such as LOCO-Edit.

Figure 6: Evolution of the CLIP score during manifold-constrained optimization, showing consistent improvement in semantic alignment.

Practical and Theoretical Implications

GeoEdit's approach to editing—via on-manifold tangent frames and sample-based estimation—offers significant practical benefits:

Interactive continuous control: Edit strength decoupled from full re-diffusion yields low latency, fine-grained manipulation, and enables applications in real-time image editing.
Optimization compatibility: Principle directions allow plug-in differentiable objectives (e.g., CLIP) for guided edits.
Training-free deployment: The algorithm avoids retraining or auxiliary models, making it scalable and easy to integrate into existing pipelines.

Theoretically, GeoEdit strengthens the case for manifold-aware generative modeling and editing. By bounding tangent-space estimation errors and linking geometric analysis to efficient editing algorithms, it sets a precedent for future developments in both generative modeling and geometry-based optimization in high dimensional settings. GeoEdit could also be extended to more complex generative tasks such as shape synthesis and higher-order representation learning.

Conclusion

GeoEdit introduces a theoretically rigorous, efficient, and training-free framework for on-manifold editing in diffusion models by leveraging sample-based local tangent estimation. Its pipeline enables smooth, semantic edits with controllable strength, high fidelity, and computational efficiency. The results support the manifold hypothesis in generative modeling and open avenues for manifold-constrained optimization, task-aligned objective design, and geometric editing in advanced AI systems.

(2604.24238)

Markdown Report Issue