Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 52 tok/s

Gemini 2.5 Pro 38 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 38 tok/s Pro

GPT-4o 105 tok/s Pro

Kimi K2 213 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance (2401.02126v1)

Published 4 Jan 2024 in cs.CV

Abstract: Existing text-to-image editing methods tend to excel either in rigid or non-rigid editing but encounter challenges when combining both, resulting in misaligned outputs with the provided text prompts. In addition, integrating reference images for control remains challenging. To address these issues, we present a versatile image editing framework capable of executing both rigid and non-rigid edits, guided by either textual prompts or reference images. We leverage a dual-path injection scheme to handle diverse editing scenarios and introduce an integrated self-attention mechanism for fusion of appearance and structural information. To mitigate potential visual artifacts, we further employ latent fusion techniques to adjust intermediate latents. Compared to previous work, our approach represents a significant advance in achieving precise and versatile image editing. Comprehensive experiments validate the efficacy of our method, showcasing competitive or superior results in text-based editing and appearance transfer tasks, encompassing both rigid and non-rigid settings.

References (18)