Discrete Distribution Networks (2401.00036v3)

Published 29 Dec 2023 in cs.CV and cs.LG

Abstract: We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently capture distributional information, enabling the network to generate multiple samples simultaneously, rather than a single output, may offer an effective way to represent distributions. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with unique properties: more general zero-shot conditional generation and 1D latent representation. We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. The code is available at https://discrete-distribution-networks.github.io/

References (36)

Summary

The paper introduces a novel DDN architecture that leverages discrete structures and hierarchical models to achieve zero-shot conditional image generation using CLIP.
It employs advanced techniques like Split-and-Prune and Chain Dropout to improve generative performance by mitigating dead nodes and density shifts.
The hierarchical design enables effective latent semantic analysis, resulting in high-fidelity, versatile image transformations and novel synthesis outcomes.

Discrete Distribution Networks

Introduction

The paper "Discrete Distribution Networks" introduces a novel architecture known as Discrete Distribution Networks (DDNs) which leverage discrete structures for generative modeling. By incorporating techniques such as Zero-Shot Conditional Generation (ZSCG), DDNs advance beyond traditional generative models by effectively generating conditioned images without the conventional requirement of gradient-based optimization or iterative refinement. This is made possible by their innovative use of discrete sample spaces and hierarchical structures.

Zero-Shot Conditional Generation

The DDN architecture achieves Zero-Shot Conditional Generation by utilizing CLIP as a conditioning mechanism, which guides the generative process without needing additional gradient computations. As demonstrated in Figure 1, the model generates images based on textual prompts, showcasing its ability to integrate broad contextual understanding.

Figure 1: Zero-Shot Conditional Generation guided by CLIP. The text at the top is the guide text for that column.

Further exploration of conditional capabilities is shown in experiments where DDN balances multiple conditions through separate samplers, each controlling distinct image features. Figure 2 illustrates this multi-condition generation, highlighting the network's versatility in accommodating weighted influences during synthesis.

Figure 2: Zero-Shot Conditional Generation under the Influence of Multiple Conditions. The DDN balances the steering forces of CLIP and Inpainting according to their associated weights.

Generative Performance

The paper validates the generative and reconstructive performance of DDNs through comprehensive experiments. As seen in Figure 3, the analysis of nearest neighbors in the training set indicates DDN's capacity to produce novel images that closely adhere to the target distribution, demonstrating effective generalization.

Figure 3: Nearest neighbors of the model trained on FFHQ. The leftmost column presents images generated by the DDN. Starting from the second column, we display the images from FFHQ that are most similar to the generated images, as measured by LPIPS.

Advanced Techniques for Generative Tasks

The introduction of methods such as Split-and-Prune and Chain Dropout delivers significant enhancements in image generation. These approaches tackle issues like "dead nodes" and "density shift.” Figures 5 to 7 highlight the efficacy of these techniques, by displaying the improvements achieved through strategic manipulations of the discrete sample space, retargeting nodes to better fit the ground truth distribution.

Figure 4: Illustration of the random sample generation effects as part of the ablation paper on our DDNs model.

Conditional Generative Abilities

DDNs exhibit advanced image-to-image transformation abilities through Conditional DDNs. This capability is vividly demonstrated by Figure 5, where the network receives conditions via other images, facilitating tasks such as coloring and edge-to-RGB conversions, and showcasing its potential for complex generative applications.

Figure 5: Conditional DDN performing coloring and edge-to-RGB tasks. Columns 4 and 5 display the generated results under the guidance of other images, where the produced image strives to adhere to the style of the guided image as closely as possible while ensuring compliance with the condition. The resolution of the generated images is $256 \times 256$ .

Hierarchical and Latent Analysis

DDNs are structured hierarchically to leverage latent semantic structures; Figure 6 impressively visualizes this by illustrating the synthesis process as a recursive grid, allowing insights into node influences and ancestor relations. Such a setup not only taps into semantic contextualization but also facilitates higher-level reasoning in the latent spaces.

Figure 6: Hierarchical Generation Visualization of DDN with $L = 4$ . Each sample with a colored border represents an intermediate generation product; small samples without colored borders are the final generated images.

Conclusion

The research on Discrete Distribution Networks opens new avenues for generative modeling, providing innovative methods for efficient and diverse sample generation without gradient dependencies. By incorporating discrete structures, DDNs circumvent limitations of traditional generative approaches, positioning themselves as robust alternatives for tasks demanding conditional generative flexibility and fidelity. Future work could further refine these methods for higher-dimensional data, improve handling of complex datasets like ImageNet, and explore adversarial training techniques to enhance image sharpness and detail.