Efficient Semantic Image Synthesis via Class-Adaptive Normalization (2012.04644v2)

Published 8 Dec 2020 in cs.CV and cs.GR

Abstract: Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away. Despite its impressive performance, a more thorough understanding of the advantages inside the box is still highly demanded to help reduce the significant computation and parameter overhead introduced by this novel structure. In this paper, from a return-on-investment point of view, we conduct an in-depth analysis of the effectiveness of this spatially-adaptive normalization and observe that its modulation parameters benefit more from semantic-awareness rather than spatial-adaptiveness, especially for high-resolution input masks. Inspired by this observation, we propose class-adaptive normalization (CLADE), a lightweight but equally-effective variant that is only adaptive to semantic class. In order to further improve spatial-adaptiveness, we introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE and propose a truly spatially-adaptive variant of CLADE, namely CLADE-ICPE.Through extensive experiments on multiple challenging datasets, we demonstrate that the proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE, but it is much more efficient with fewer extra parameters and lower computational cost. The code and pretrained models are available at \url{https://github.com/tzt101/CLADE.git}.

Citations (85)

View on Semantic Scholar

Summary

The paper’s main contribution is CLADE normalization, which adapts activation scales based on semantic classes to improve efficiency.
It demonstrates significant reductions in parameters and computation by eliminating the need for a modulating network, validated across multiple datasets.
Integrating intra-class positional encoding, CLADE achieves effective spatial adaptiveness, balancing quality and resource usage in high-resolution synthesis.

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

The paper "Efficient Semantic Image Synthesis via Class-Adaptive Normalization" proposes a novel approach to semantic image synthesis by leveraging a class-adaptive normalization strategy called CLADE. This methodology stems from a detailed analysis of the existing spatially-adaptive normalization technique, SPADE, which has been widely recognized for its effectiveness in retaining semantic information during the image generation process. The authors identify that semantic-awareness in normalization plays a more crucial role than spatial adaptiveness for high-resolution input semantic masks, leading to the conceptualization of CLADE.

Core Contributions

The paper offers several key contributions:

CLADE Normalization: CLADE introduces class-adaptive normalization, which adjusts activation scales and shifts based on semantic classes rather than spatial layouts. This approach mitigates the conventional computation and parameter overhead associated with spatially-adaptive normalization provided by SPADE.
Computational Efficiency: By removing the need for a modulating network, CLADE significantly reduces the number of parameters and computation costs involved in the generation process. This efficiency is especially beneficial for practical applications requiring real-time synthesis capabilities.
CLADE-ICPE Implementation: The introduction of intra-class positional map encoding (ICPE) further enhances the generative model by integrating spatial variance within semantic classes. This refinement empowers CLADE to achieve true spatial adaptiveness without incurring substantial computational burdens.

Results and Findings

Through a series of experiments across multiple datasets such as Cityscapes, COCO-Stuff, ADE20k, and more, CLADE demonstrates comparable synthesis quality to SPADE with markedly fewer computational requirements. CLADE's performance is validated using metrics like mean Intersection-over-Union (mIoU), pixel accuracy, and Fréchet Inception Distance (FID), showcasing improvements in efficiency and consistency with synthesized visual fidelity.

Analyses revealed that the original SPADE normalization incurs excessive parameter overhead, especially in high-resolution synthesis tasks, without fully utilizing spatial adaptability. CLADE circumvents this inefficiency by prioritizing semantic class adaptiveness, presenting an optimal balance between quality and performance. The adaptation extends to incorporate positional encoding, thereby ameliorating intra-class spatial differentiation and realistic texture generation.

Implications and Future Directions

The findings of this paper have significant theoretical and practical implications. The class-adaptive strategy not only contributes to the understanding of semantic image synthesis but also opens avenues for efficient implementation in various deep learning architectures concerning generative models. Practically, its minimal resource footprint suggests appropriateness for deployment on devices with limited capabilities.

Future prospects could explore extensions of CLADE in other conditional generation contexts, either enriching its adaptability to diverse domains or exploring hybrid normalization techniques combining features of SPADE and CLADE within particular architectural contexts. Further exploration into optimizing ICPE and its integration within other generative frameworks could substantially elevate spatial coherence and object alignment in synthesized content.

This research underscores the potential for reevaluating normalizing techniques not just from a computational perspective but in optimizing the semantic coherence and resource efficiency in generative networks, setting a precedent for future innovations in the domain of image synthesis.

PDF Markdown

Related Papers

GitHub

GitHub - tzt101/CLADE: Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021) (59 stars)