Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimisation-Based Multi-Modal Semantic Image Editing (2311.16882v1)

Published 28 Nov 2023 in cs.CV, cs.CL, and cs.LG

Abstract: Image editing affords increased control over the aesthetics and content of generated images. Pre-existing works focus predominantly on text-based instructions to achieve desired image modifications, which limit edit precision and accuracy. In this work, we propose an inference-time editing optimisation, designed to extend beyond textual edits to accommodate multiple editing instruction types (e.g. spatial layout-based; pose, scribbles, edge maps). We propose to disentangle the editing task into two competing subtasks: successful local image modifications and global content consistency preservation, where subtasks are guided through two dedicated loss functions. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits, through both qualitative and quantitative experiments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bowen Li (166 papers)
  2. Yongxin Yang (73 papers)
  3. Steven McDonagh (43 papers)
  4. Shifeng Zhang (46 papers)
  5. Petru-Daniel Tudosiu (18 papers)
  6. Sarah Parisot (30 papers)