Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Person Image Synthesis in Arbitrary Poses (1809.10280v1)

Published 27 Sep 2018 in cs.CV

Abstract: We present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.

Citations (165)

Summary

  • The paper introduces an unsupervised GAN framework capable of synthesizing person images in arbitrary poses from a single input image, reducing reliance on extensive labeled datasets.
  • The proposed architecture includes a novel identity-preserving loss that combines content and style components to maintain the individual's appearance and high-frequency details across different poses.
  • Evaluations on the DeepFashion dataset demonstrate that the unsupervised method achieves competitive performance metrics (SSIM, IS) compared to supervised approaches, enabling applications like virtual try-on systems.

Unsupervised Person Image Synthesis in Arbitrary Poses

The task of generating person images in new poses from a single input image presents significant computational challenges due to its inherently ill-posed nature. The paper "Unsupervised Person Image Synthesis in Arbitrary Poses" proposes an innovative framework that aims to address this challenge by deploying a generative adversarial network (GAN) model in an unsupervised learning environment. The authors of this research focus on overcoming the constraints of supervised learning techniques in image synthesis, which require labeled datasets with different poses. By adopting an unsupervised strategy, the framework enables the utilization of extensive collections of unpaired data for training.

The core architecture of the proposed GAN consists of a generator and a discriminator, augmented with a pose detector and a tailored loss framework. The generator synthesizes images conditioned on the target pose without needing paired images of the same person in different poses for training. The discriminator, following the PatchGAN approach, ensures the photorealistic quality of the synthetic images. In contrast to previous works, this approach integrates a novel identity-preserving loss designed to maintain the individual's appearance, including high-frequency details like clothing texture. This identity loss consists of a content loss derived from semantic similarities and a style loss focused on texture consistency around joints.

Comprehensive evaluations carried out on the DeepFashion dataset demonstrate the viability and effectiveness of this unsupervised approach. The performance metrics—Structural Similarity Index (SSIM) and Inception Score (IS)—suggest that the unsupervised method achieves competitive results relative to supervised models, indicating near parity in perceptual quality and semantic fidelity in many cases.

The implications of this research are notable, particularly in fields like fashion and media, where it has the potential to facilitate the creation of varied virtual try-on systems, interactive animations, and other innovative applications. The unsupervised nature of the model decreases dependency on extensive labeled datasets, broadening the scope for application to diverse datasets and possibly other domains beyond human images. Future work suggested by the research includes enhancements to manage complex backgrounds more effectively and the integration of geometry-aware loss elements to address current limitations and improve realism in generated images.

This research expands the capabilities of GAN-based image synthesis, particularly in the context of unsupervised learning, offering a significant methodological step that may inspire further developments in AI-driven image generation and transformation.