Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatially Controllable Image Synthesis with Internal Representation Collaging (1811.10153v2)

Published 26 Nov 2018 in cs.CV and cs.LG

Abstract: We present a novel CNN-based image editing strategy that allows the user to change the semantic information of an image over an arbitrary region by manipulating the feature-space representation of the image in a trained GAN model. We will present two variants of our strategy: (1) spatial conditional batch normalization (sCBN), a type of conditional batch normalization with user-specifiable spatial weight maps, and (2) feature-blending, a method of directly modifying the intermediate features. Our methods can be used to edit both artificial image and real image, and they both can be used together with any GAN with conditional normalization layers. We will demonstrate the power of our method through experiments on various types of GANs trained on different datasets. Code will be available at https://github.com/pfnet-research/neural-collage.

Citations (41)

Summary

  • The paper introduces spatial conditional batch normalization (sCBN) to enable precise local semantic manipulations without retraining the network.
  • It presents a feature blending technique that directly modifies intermediate GAN representations to adjust complex attributes like facial expressions.
  • Experimental evaluations on SNGAN, BigGAN, and StyleGAN models demonstrate enhanced photorealism and improved performance in local image transformations.

Spatially Controllable Image Synthesis with Internal Representation Collaging

This paper introduces an advanced method for image editing based on internal representation manipulation within deep generative networks. The authors present a novel convolutional neural network (CNN) framework for altering the semantic content of images, focusing on spatial control within a trained generative adversarial network (GAN).

Key Contributions

The paper makes two primary contributions to the field of image synthesis:

  1. Spatial Conditional Batch Normalization (sCBN): This variant of conditional batch normalization allows users to input spatial weight maps to control image semantics in specific areas. It facilitates label collaging by enabling local semantic changes in the image. This method modifies intended features without retraining the network, leveraging spatial manipulation to generate contextually and visually coherent transformations.
  2. Feature Blending: This technique permits direct modification of intermediate features in the GAN, allowing for the synthesis of complex image features through the blending of reference images. This method supports intricate modifications like altering the posture or facial expression in a synthesized image without explicit model definitions of these attributes.

Experimental Evaluation

The methods were rigorously evaluated using various GAN architectures, including SNGAN, BigGAN, and StyleGAN, trained on datasets such as ImageNet and FFHQ. The results demonstrate the capability of producing photorealistic images with significant control over semantic manipulations, such as changing animal breeds or human facial expressions.

Numerical Results

Quantitative assessments, including classification accuracy tests and human perceptual studies, were conducted to validate the fidelity of real image transformations. For instance, transformations from cats to big cats and dogs achieved top-5 error rates of 7.8% and 21.1%, respectively, surpassing existing methods like UNIT and MUNIT.

Theoretical Implications

The techniques presented offer an impactful way to explore unsupervised local semantic transformations within GANs. By adjusting batch normalization parameters and blending features in intermediate layers, the methods disentangle and control high-level attributes of generated images. This work underscores the potential for further exploration into spatially controllable synthesis, opening avenues for fine-grained, user-driven image transformations.

Practical Implications and Future Directions

The paper's methodologies demonstrate promising applications in creative fields where localized control over image content is crucial. The ability to manipulate images spatially without compromising on realism offers exciting prospects for content creation, art, and design.

Future research could expand on these techniques, exploring different types of conditional information beyond class labels, such as textual or attribute-based conditions. Applying these methods to other modalities and enhancing them with multi-modal inputs represent viable extensions, broadening the scope of generative models in artificial intelligence.

Overall, the paper advances the capabilities of GAN-based image synthesis by providing practical tools for spatial manipulation, enhancing both the theoretical understanding and practical implementation of controlled image generation techniques.