Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects (2004.04977v2)

Published 10 Apr 2020 in cs.CV, cs.GR, cs.LG, and eess.IV

Abstract: Recent advances in image generation gave rise to powerful tools for semantic image editing. However, existing approaches can either operate on a single image or require an abundance of additional information. They are not capable of handling the complete set of editing operations, that is addition, manipulation or removal of semantic concepts. To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects. In our setup, the user provides the semantic labels of the areas to be edited and the generator synthesizes the corresponding pixels. In contrast to previous methods that employ a discriminator that trivially concatenates semantics and image as an input, the SESAME discriminator is composed of two input streams that independently process the image and its semantics, using the latter to manipulate the results of the former. We evaluate our model on a diverse set of datasets and report state-of-the-art performance on two tasks: (a) image manipulation and (b) image generation conditioned on semantic labels.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Evangelos Ntavelis (7 papers)
  2. Andrés Romero (14 papers)
  3. Iason Kastanis (4 papers)
  4. Luc Van Gool (570 papers)
  5. Radu Timofte (299 papers)
Citations (72)

Summary

  • The paper introduces SESAME, a novel generator-discriminator architecture leveraging GANs with a SPADE-integrated generator and a two-stream discriminator for advanced semantic image editing.
  • Experiments demonstrate SESAME achieves state-of-the-art results in object addition/removal and layout-to-image generation across diverse datasets, showing improved realism and semantic alignment.
  • SESAME enables efficient semantic image manipulation with reduced data needs, providing significant implications for applications in media and industry and paving the way for future context-aware synthesis.

Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects (SESAME)

The paper introduces SESAME, an advanced architecture for semantic image editing tasks, utilizing a generator-discriminator framework. The primary innovation within SESAME lies in its ability to effectively perform complex semantic editing operations, such as adding, manipulating, or erasing objects in images without the need for comprehensive semantic information about the entire image.

Technical Overview

SESAME proposes a novel methodology to overcome the deficiencies of existing image editing techniques. Current methods often require extensive semantic data or are limited to single-type operations. SESAME leverages Generative Adversarial Networks (GANs) to enhance the versatility and effectiveness in semantic image manipulation.

The generator follows an encoder-decoder architecture with the integration of Spatially Adaptive Denoising (SPADE) layers in the decoder. It allows the generator to manage image contexts from the semantic labels provided for specific regions, improving coherence and fidelity in the generated images.

In contrast, the SESAME discriminator employs a two-stream approach for processing image semantics separately from the image data itself. This configuration enhances the discriminator's ability to assess realism and consistency of semantic alignment within the edits, contrary to methods using a concatenated input strategy for semantic labels and image.

Experimental Validation

Through a series of experiments, SESAME exhibits state-of-the-art performance on tasks of semantic image manipulation and layout-to-image generation. Validation on diverse datasets—Cityscapes and ADE20K—demonstrates SESAME's superior capability in generating high-quality edits across varied contexts.

  1. Object Addition and Removal: SESAME surpasses previous models in both content addition and removal tasks as evidenced by superior mIoU scores and reduced FID, indicating better alignment with ground truth semantics and enhanced image realism.
  2. Free-form Semantic Editing: SESAME provides an interface allowing users to selectively paint semantic labels over images for editing, offering unprecedented control over image modifications.
  3. Layout-to-Image Generation: By employing the SESAME discriminator alongside established SPADE generator settings, significant improvements in generating realistic images from semantic outlines were observed.

Implications and Future Directions

The introduction of SESAME into semantic editing holds substantial implications for applications ranging from media production to industrial imaging, where precise image manipulation is critical. By enabling operations under semantic constraints, SESAME simplifies workflows and reduces the requirements for semantic data collection, thus, advancing the utility and integration of AI in image editing domains.

Potential future research could explore expanding SESAME's application to different conditional generation scenarios, such as integrating scene graphs alongside the semantic layouts. Such developments could further bridge gaps in interactive and context-aware image synthesis technologies.

SESAME represents a significant step forward in semantic image editing, providing methods that highlight efficiency and flexibility. Its contributions pave the way for more refined and comprehensive approaches to automated image manipulation within computer vision research.

Github Logo Streamline Icon: https://streamlinehq.com