Controllable Person Image Synthesis with Attribute-Decomposed GAN (2003.12267v4)

Published 27 Mar 2020 in cs.CV

Abstract: This paper introduces the Attribute-Decomposed GAN, a novel generative model for controllable person image synthesis, which can produce realistic person images with desired human attributes (e.g., pose, head, upper clothes and pants) provided in various source inputs. The core idea of the proposed model is to embed human attributes into the latent space as independent codes and thus achieve flexible and continuous control of attributes via mixing and interpolation operations in explicit style representations. Specifically, a new architecture consisting of two encoding pathways with style block connections is proposed to decompose the original hard mapping into multiple more accessible subtasks. In source pathway, we further extract component layouts with an off-the-shelf human parser and feed them into a shared global texture encoder for decomposed latent codes. This strategy allows for the synthesis of more realistic output images and automatic separation of un-annotated attributes. Experimental results demonstrate the proposed method's superiority over the state of the art in pose transfer and its effectiveness in the brand-new task of component attribute transfer.

Citations (220)

View on Semantic Scholar

Summary

The paper introduces AD-GAN, which decomposes person attributes into independent latent codes, enabling precise and controllable image synthesis.
It employs dual encoding pathways and cascaded style blocks to capture detailed pose and texture information, enhancing overall synthesis quality.
Experimental results validate the model's superior performance in pose transfer and attribute manipulation using metrics like IS, SSIM, and user studies.

Controllable Person Image Synthesis with Attribute-Decomposed GAN

The paper presents an advanced generative model for the synthesis of person images, termed Attribute-Decomposed GAN (AD-GAN). This model introduces a structured pathway to synthesizing realistic person images with specific human attributes defined via multiple source inputs. The AD-GAN innovatively decomposes human attributes, such as pose, headgear, upper clothing, and lower apparel, into independent codes within a latent space. This decomposition allows for unprecedented flexibility and precision in controlling attributes through style mixing and interpolation within explicit style representations.

Methodological Contributions

The core methodology involves a newly proposed architecture featuring dual encoding pathways connected by style blocks:

Dual Encoding Pathways: The architecture diverges into two distinct encoding pathways: one for pose encoding, where keypoints-based 2D skeletons represent pose as codes, and one for decomposed component encoding. Through this bifurcation, the model decomposes the complex mapping process into manageable subtasks.
Decomposed Component Encoding: The model employs an off-the-shelf human parser to extract component layouts which feed into a shared global texture encoder. This strategy ensures the synthesis of realistic person images and facilitates automatic separation of unannotated attributes.
Texture Style Transfer: The cascaded style blocks incorporate a fusion module to inject source person's textual styles into pose codes effectively, leveraging adaptive instance normalization to handle style transfer across disparate person images.

Experimental Analysis

The experimental results showcase the AD-GAN's superior performance over existing state-of-the-art methodologies in pose transfer and a novel task of component attribute transfer. Notable within the experiments is the model's ability to synthesize highly plausible person images under broad attribute manipulations. The results are quantitatively validated through a suite of metrics including Inception Score (IS), Structural Similarity (SSIM), and a newly proposed Contextual (CX) Score, coupled with qualitative user studies measuring the naturalness of synthesized images.

Implications and Future Research

The implications of this research are multifaceted. Practically, the AD-GAN can be applied in domains such as movie making, virtual fashion try-ons, and enhanced person re-identification systems. Theoretically, this work advances the field by demonstrating a pathway to improved attribute disentanglement in a generative context.

Looking ahead, future research could expand this method's application to other image synthesis domains, potentially exploring real-time attribute modifications in video sequences or integration with reinforcement learning frameworks to further enhance image realism and coherence under dynamic environmental conditions. Moreover, the strategy of decomposing complex tasks into independent components could extend beyond image synthesis, potentially advancing other AI fields requiring high-dimensional data manipulation.

This paper represents a significant stride in controllable person image synthesis, opening pathways for further exploration and refinement in attribute-guided generative models.

PDF Markdown

Related Papers

YouTube

Show All Videos