We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. This is achieved by introducing a condition-aware weight generation module that generates conditional weight for convolution/linear layers based on the input condition. We test CAN on class-conditional image generation on ImageNet and text-to-image generation on COCO. CAN consistently delivers significant improvements for diffusion transformer models, including DiT and UViT. In particular, CAN combined with EfficientViT (CaT) achieves 2.78 FID on ImageNet 512x512, surpassing DiT-XL/2 while requiring 52x fewer MACs per sampling step.
The Condition-Aware Neural Network (CAN) represents a novel approach in generative models, using dynamic weight alteration based on input conditions to enhance image generation controllability.
CAN introduces a novel conditional control mechanism through weight manipulation and provides practical design insights for optimal application, leading to substantial improvements in image generative models.
The study demonstrates CAN's superiority over previous conditional control methods in terms of efficiency and effectiveness, particularly when integrated with diffusion transformer architectures.
CAN's introduction paves the way for future research in generative models and conditioned image synthesis, suggesting potential extensions to video generation and integration with other efficiency-improving techniques.
Recent advancements in generative models have shown promising results in the synthesis of photorealistic images and videos. Nevertheless, the potential of these models has yet to be fully unlocked, particularly concerning the controllability aspect of the generation process. The Condition-Aware Neural Network (CAN) offers a novel approach by dynamically altering the neural network's weights based on input conditions, such as class labels or textual descriptions. This contrasts with the conventional method of manipulating features within the network. CAN's significance is demonstrated through substantial improvements in image generative models, particularly with diffusion transformer architectures like DiT and UViT.
The implementation of CAN signifies a shift toward manipulating the weight space for conditional control in image generative models. The central contributions of the study are as follows:
The empirical evaluation of CAN, especially when applied to diffusion transformer models, underscores its practical utility. The study judiciously identifies the network components that benefit most from condition-aware weight adjustment and elucidates the effectiveness of directly generating the conditional weight. Moreover, the experimental results on class-conditional generation and text-to-image synthesis validate the robustness and generalizability of CAN across diverse tasks and datasets.
The introduction of CAN opens up new avenues for research in generative models and conditioned image synthesis. From a theoretical standpoint, this work expands our understanding of conditional control mechanisms by showcasing the potential of weight space manipulation. Practically, the efficiency gains facilitated by CAN present opportunities for deploying advanced image generative models on resource-constrained devices, thereby broadening their applicability.
Looking forward, the extension of CAN to tasks beyond image generation, such as large-scale text-to-image synthesis and video generation, presents an exciting area for future exploration. Additionally, integrating CAN with other efficiency-enhancing techniques could further revolutionize the deployment and performance of generative models in real-world applications.
In summary, the Condition-Aware Neural Network marks a significant step forward in the controlled generation of images. By effectively manipulating the neural network's weights based on input conditions, CAN achieves superior performance and efficiency, setting a new benchmark for future developments in the field of generative AI.