MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing (2010.16417v1)

Published 30 Oct 2020 in cs.CV

Abstract: Despite the recent success of face image generation with GANs, conditional hair editing remains challenging due to the under-explored complexity of its geometry and appearance. In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation. To provide user control over every major hair visual factor, we explicitly disentangle hair into four orthogonal attributes, including shape, structure, appearance, and background. For each of them, we design a corresponding condition module to represent, process, and convert user inputs, and modulate the image generation pipeline in ways that respect the natures of different visual attributes. All these condition modules are integrated with the backbone generator to form the final end-to-end network, which allows fully-conditioned hair generation from multiple user inputs. Upon it, we also build an interactive portrait hair editing system that enables straightforward manipulation of hair by projecting intuitive and high-level user inputs such as painted masks, guiding strokes, or reference photos to well-defined condition representations. Through extensive experiments and evaluations, we demonstrate the superiority of our method regarding both result quality and user controllability. The code is available at https://github.com/tzt101/MichiGAN.

Citations (40)

View on Semantic Scholar

Summary

The paper introduces MichiGAN, a conditioned GAN framework that enables user-controlled, high-fidelity hair synthesis in portrait images.
It decomposes hair attributes into shape, structure, appearance, and background using dedicated condition modules and novel feature extraction methods.
Evaluations demonstrate MichiGAN's superiority with lower FID scores and high realism as confirmed by quantitative metrics and user studies.

An Overview of MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

The paper "MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing" introduces a novel framework for interactive conditional hair editing in portrait images. The authors identify the challenges associated with hair manipulation due to its intricate geometry and diverse appearance and aim to address these challenges through an innovative use of Generative Adversarial Networks (GANs). The proposed solution, MichiGAN, is designed to provide precise, user-controlled editing of hair in portrait images by decomposing the hair attributes into distinct categories: shape, structure, appearance, and background.

Key Contributions

MichiGAN's primary innovation lies in its approach to condition the image generation process on multiple user inputs across four orthogonal attributes:

Shape, Structure, and Background Modules: Each of these attributes is treated via a specific condition module. The shape is represented by a binary mask, while the structure is depicted using dense orientation maps, which provide detailed controls over the flow of hair strands. The background is preserved using a parallel encoder that progressively injects features into the generator.
Appearance Module: This attribute is treated as a style transfer problem. It extracts stylistic features from a reference image while avoiding spatially variant data, using a novel mask-transformed feature extraction method. This ensures that the synthesized hair adapts appropriately to a variety of depicted scenes or user-specified look preferences.
End-to-End Conditional Network: The integration of all condition modules into MichiGAN's backbone generator allows complete control over the hair generation process. This comprehensive design ensures that users can manipulate hair precisely and flexibly through an interactive editing system based on intuitive user inputs such as painted masks and reference photos.

Technical Evaluation

The authors evaluate MichiGAN quantitatively using Fréchet Inception Distance (FID) and qualitatively through visual comparisons and user studies against state-of-the-art methods like pix2pixHD, SPADE, MaskGAN, and SC-FEGAN. MichiGAN achieves a lower FID score, indicating superior visual quality in the generated images. These findings are corroborated by user studies showing the high realism of MichiGAN-generated portraits, achieving a fooling rate comparable with real human images.

Additionally, the authors conduct extensive experiments to illustrate MichiGAN’s capacity for individual and joint manipulation of attributes. This flexibility highlights the method's success in disentangling various hair properties and enabling realistic, custom edits tailored to user requirements.

Implications and Future Work

MichiGAN is a significant step forward in interactive hair manipulation in portrait editing, providing both theoretical and practical advancements in styled, conditioned image editing. Implications of such work are particularly relevant for fields like digital art, virtual reality, and personalized media content creation.

Future extensions of this research could focus on optimizing how the conditional network adapts to novel user inputs and manages large-scale deployment, possibly improving computational efficiency without undermining result fidelity. Moreover, enhancements in handling extreme structural changes and boundary conditions may mitigate some of the current limitations highlighted by the authors. Addressing these challenges would push the boundaries of real-time, high-fidelity generative models significantly forward.

Ultimately, MichiGAN presents a refined and robust framework for hair image synthesis, impacting the landscape of GAN-based conditional image generation methodologies. As the field progresses, contributions like MichiGAN will lay the groundwork for even more interactive, user-driven advancements in the generative modeling domain.

PDF Markdown

Related Papers

GitHub

GitHub - tzt101/MichiGAN: MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing (SIGGRAPH 2020) (294 stars)

YouTube

Show All Videos