Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation (1904.06807v2)

Published 15 Apr 2019 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map. The proposed SelectionGAN explicitly utilizes the semantic information and consists of two stages. In the first stage, the condition image and the target semantic map are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using a multi-channel attention selection mechanism. Moreover, uncertainty maps automatically learned from attentions are used to guide the pixel loss for better network optimization. Extensive experiments on Dayton, CVUSA and Ego2Top datasets show that our model is able to generate significantly better results than the state-of-the-art methods. The source code, data and trained models are available at https://github.com/Ha0Tang/SelectionGAN.

Authors (6)

Hao Tang (379 papers)
Dan Xu (120 papers)
Nicu Sebe (271 papers)
Yanzhi Wang (197 papers)
Jason J. Corso (71 papers)
Yan Yan (242 papers)

Citations (197)

View on Semantic Scholar

Summary

The paper introduces a two-stage GAN that integrates semantic maps in coarse generation and refines outputs with a multi-channel attention mechanism.
The methodology addresses extreme viewpoint variations by ensuring structural consistency through cascaded semantic guidance.
Experimental results on Dayton, CVUSA, and Ego2Top show improved SSIM, PSNR, and accuracy over state-of-the-art models.

Multi-Channel Attention Selection GAN for Cross-View Image Translation

The paper entitled "Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation" introduces an advanced approach to the cross-view image synthesis problem. This task demands generating novel images from drastically different viewpoints, posing significant challenges due to severe deformations and variations in scene structures. The authors present a sophisticated method called Multi-Channel Attention SelectionGAN (SelectionGAN) that leverages semantic information to enhance the generation process across multiple viewpoints.

Proposed Methodology

The SelectionGAN framework employs a two-stage generation process:

Stage I: Semantic-Guided Generation The first stage involves a cycled semantic-guided network that utilizes conditional images and target semantic maps to generate initial coarse outputs. This stage applies strong supervision by integrating semantic maps directly into the generation inputs and outputs, refining structural consistency through a cycled generation process.
Stage II: Multi-Channel Attention Refinement In the second stage, the initial results are refined using a multi-channel attention selection mechanism. This module generates diverse intermediate outputs, employing learned attention maps to perform spatial selection and synthesize more detailed results. The attention maps also facilitate the generation of uncertainty maps, guiding the pixel loss to enhance optimization resilience.

Experimental Results

The evaluation on datasets such as Dayton, CVUSA, and Ego2Top demonstrates the efficacy of SelectionGAN. Notably, the method achieves superior performance compared to state-of-the-art models like Pix2pix, X-Fork, and X-Seq, especially in terms of SSIM, PSNR, and accuracy metrics. The cascade approach notably improves the generation quality by addressing complex scene structures through a coarse-to-fine process.

Implications and Future Directions

SelectionGAN provides insights into leveraging semantic maps and attention mechanisms to tackle the inherent challenges of cross-view image translation. By using a multi-channel approach, the model captures a richer set of scene details, which could inspire further research into more complex scene understanding tasks in AI.

The methodology highlights potential pathways for incorporating semantic information more effectively in image synthesis, possibly extending to applications in virtual reality and autonomous navigation. Future exploration might involve improving semantic map accuracy and exploring unsupervised or weakly-supervised settings, expanding the applicability of cross-view translation models.

Overall, the paper makes a compelling contribution to the field of image translation by proposing a structured approach that systematically addresses the difficulties of generating images from widely disparate viewpoints. The insights garnered could enhance the development of robust, generalizable models in computer vision.

PDF Markdown

Related Papers

GitHub

GitHub - Ha0Tang/SelectionGAN: [CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation (462 stars)

Tweets

https://twitter.com/PapersTrending/status/1226099178423672837

https://twitter.com/HaoTang_ai/status/1163198040896937985

https://twitter.com/PapersTrending/status/1153968516304953344