Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis (2303.14157v3)

Published 24 Mar 2023 in cs.CV and cs.LG

Abstract: Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Thuan Hoang Nguyen (5 papers)
  2. Thanh Van Le (2 papers)
  3. Anh Tran (68 papers)
Citations (3)

Summary

  • The paper introduces CREPS, a novel model that replaces traditional convolutions with modulated linear layers to achieve scale-equivariance.
  • It utilizes a bi-line representation that decomposes 2D features into row and column encodings, significantly reducing memory overhead.
  • Experimental results demonstrate CREPS' superior performance over models like StyleGAN2, enabling adaptable real-time, high-resolution image synthesis.

Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis

The paper presents an innovative approach to image synthesis at arbitrary scales through a novel generative model called Column-Row Entangled Pixel Synthesis (CREPS). This model presents an efficient, scale-invariant image generation solution by directly addressing the common issues encountered in convolutional GAN architectures, such as position-dependent artifacts and memory inefficiency, especially when scaling to ultra-high resolutions.

Key Contributions and Methodology

The CREPS framework eschews traditional spatial convolutions and hierarchical architectural design in favor of more efficient mechanisms. The paper highlights several significant contributions:

  1. Scale-Equivariance without Spatial Convolutions: Traditional GAN structures, extensively reliant on convolutions, introduce a spatial bias which results in inconsistencies during translation, rotation, or scaling. CREPS circumvents this by utilizing modulated linear layers over spatial convolutions, thereby enhancing scale consistency across outputs.
  2. Bi-Line Representation: To optimize memory usage and computational efficiency, CREPS introduces a bi-line representation that decomposes 2D feature maps into two separate "thick" row and column encodings. This structural innovation facilitates handling of high-resolution data by massively reducing memory overhead.
  3. Layer-Wise Feature Composition Scheme: CREPS employs a layered approach, where feature maps are synthesized using intermediate row and column embeddings, which are then composited to produce the final image output. This composite method significantly enhances the expressive power of the network without demanding excessive computational resources.

The paper provides compelling evidence through experiments conducted on various datasets such as FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery. These results showcase CREPS' superior ability to synthesize consistent output details across varying scales compared to existing models like StyleGAN2 and CIPS.

Experimental Insights

The evaluation results demonstrate how CREPS offers a substantial improvement in scale-consistency measures without the computational baggage typical of convolutional models. Notably, CREPS efficiently generates high-quality images at resolutions up to 6K, leveraging the flexibility to adapt output resolution dynamically in real-time scenarios.

Further experimentation on CREPS’ adaptability via transfer learning, notably on MetFaces and AFHQ-Dog, exhibits that the proposed architecture can effectively handle domain shifts while maintaining high-quality outputs. This adaptability signifies CREPS' potential in a broad array of real-world applications, where training data may not always reflect the variety present in deployment settings.

Implications and Future Directions

The implications of CREPS extend across both theoretical and practical domains. Theoretically, it challenges the conventional reliance on convolutions in image synthesis, presenting a scalable alternative that delivers consistency and flexibility. On a practical level, its reduced memory footprint and scale-equivariance open opportunities for real-time applications in areas like personalized content creation, where real-time adaptability to display requirements (e.g., various screen sizes) is crucial.

Future research can aim at refining CREPS to eliminate any residual artifacts—possibly integrating insights from continuous implicit neural representations or refining non-linear activation processes to bolster output quality further. Moreover, exploring applications beyond 2D image synthesis, such as video generation or 3D rendering, could uncover more extensive uses of the unveiled scale-invariant principles.

In summary, CREPS represents a significant stride in arbitrary-scale image synthesis, offering a promising alternative that alleviates traditional convolutional constraints while maintaining or improving image quality. This research lays a groundwork for ongoing exploration into enhancing the scalability and adaptability of generative models.

Youtube Logo Streamline Icon: https://streamlinehq.com