Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis (2211.06719v1)

Published 12 Nov 2022 in cs.CV and cs.AI

Abstract: We present a novel bipartite graph reasoning Generative Adversarial Network (BiGraphGAN) for two challenging tasks: person pose and facial image synthesis. The proposed graph generator consists of two novel blocks that aim to model the pose-to-pose and pose-to-image relations, respectively. Specifically, the proposed bipartite graph reasoning (BGR) block aims to reason the long-range cross relations between the source and target pose in a bipartite graph, which mitigates some of the challenges caused by pose deformation. Moreover, we propose a new interaction-and-aggregation (IA) block to effectively update and enhance the feature representation capability of both a person's shape and appearance in an interactive way. To further capture the change in pose of each part more precisely, we propose a novel part-aware bipartite graph reasoning (PBGR) block to decompose the task of reasoning the global structure transformation with a bipartite graph into learning different local transformations for different semantic body/face parts. Experiments on two challenging generation tasks with three public datasets demonstrate the effectiveness of the proposed methods in terms of objective quantitative scores and subjective visual realness. The source code and trained models are available at https://github.com/Ha0Tang/BiGraphGAN.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a bipartite graph reasoning module that captures long-range relations to effectively manage pose deformations.
It leverages interactive and attention-based modules to enhance feature representations and refine image synthesis.
Extensive evaluations on Market-1501, DeepFashion, and Radboud Faces show improved SSIM and IS over previous methods.

Overview of "Bipartite Graph Reasoning GANs for Person Pose and Facial Image Synthesis"

The paper presents an innovative framework named Bipartite Graph Reasoning GANs (BiGraphGAN), which focuses on addressing the challenges in person pose and facial image synthesis. This approach is underpinned by a novel architecture that incorporates bipartite graph reasoning to model the intricate relationships necessary for translating images across poses or expressions.

BiGraphGAN Architecture

The core of BiGraphGAN is its unique ability to handle pose deformation by reasoning long-range cross relations through a bipartite graph structure. This is achieved through:

Bipartite Graph Reasoning (BGR) Block: This component models long-range cross relations between the source and target poses in a bipartite graph. By using Graph Convolution Networks (GCNs), the BGR block effectively captures these relations and contributes to mitigating pose deformation challenges.
Interaction-and-Aggregation (IA) Block: This block enhances the feature representations of both person shape and appearance. It employs an interactive method to update these features, facilitating a more coherent synthesis of the target pose or expression.
Attention-Based Image Fusion (AIF) Module: Integrated to refine the final image result, this module selectively combines information from the input and generated intermediate images, thus improving generation results.

Part-Aware Enhancements

The extension to BiGraphGAN++, a more nuanced version of BiGraphGAN, introduces a Part-aware Bipartite Graph Reasoning (PBGR) block. This block allows the decomposition of global transformation tasks into local transformations for body/face parts, thus offering a more detailed mapping of semantic changes, particularly beneficial for localized features in images.

Evaluation and Results

The evaluation of BiGraphGAN across the challenging datasets Market-1501, DeepFashion, and Radboud Faces demonstrates the efficacy of the proposed approach. The authors report substantial improvements in standard metrics such as SSIM and IS over existing methods like PG2 and Deformable GANs. Notably, the proposed framework achieves superior visual realism and shape consistency, indicating its robustness and adaptability to variations in source and target inputs.

Implications and Future Directions

The success of BiGraphGAN and its extension BiGraphGAN++ lies in its ability to reason about complex spatial relationships via a graph-based approach, setting a precedent for future work in pose and expression synthesis. This paradigm highlights the potential of GCNs beyond traditional relational content reasoning, offering new insights into model architecture designs in the field of Generative Adversarial Networks.

Looking ahead, the application of such graph-based reasoning frameworks could extend to other domains of AI where spatial or relational transformations play a crucial role, thereby advancing the state-of-the-art in synthesis and generation tasks across various modalities. Continued exploration in this direction could see the emergence of more refined models capable of high-fidelity image synthesis, enriching the tools available for creative and practical AI applications.

PDF Markdown

Related Papers

GitHub

GitHub - Ha0Tang/BiGraphGAN: [BMVC 2020 Oral] Bipartite Graph Reasoning GANs for Person Image Generation (130 stars)