CompVAE: Compositional Variational AutoEncoder
- CompVAE is a generative model that represents data as sets of elements, ensuring invariance to element order and flexible set sizes.
- It uses a GNN-based inference for per-part Gaussian latents and a global latent code to capture interaction nuances.
- The architecture supports programmable latent operations like addition and removal, demonstrating robust synthetic reconstructions.
CompVAE (Compositional Variational AutoEncoder) is a generative model designed for data exhibiting a multi-ensemblist structure—datasets where each instance consists of a set or aggregation of elements ("parts") rather than a single vectorial observation. The model is derived from Bayesian variational principles and enables explicit compositionality: it allows for the representation, generation, and manipulation of wholes based on arbitrary combinations of their constituent parts, exhibiting invariance to both the order and the number of elements. CompVAE achieves this by factorizing the generative process and inference so as to support programmable operations—such as addition and removal of elements—in the learned latent space (Berger et al., 2020).
1. Model Structure and Generative Process
The core of CompVAE is a generative model where each observed datapoint is described as a combination of a (possibly variable-size) set of elements , where each is a categorical or symbolic label drawn from a finite set . The generative process is specified as follows:
- Each part is associated with a latent variable .
- All part-latents are aggregated via an order-invariant operation—the sum—to obtain an intermediate latent .
- A global latent code is sampled conditionally on , capturing interactions or global dependencies not explained by the sum of per-part latents.
- The observation is sampled from a distribution parameterized by both and .
The model factorization is given by: where .
This design ensures that the generation of is invariant to the order of the elements and robust to the number or composition of the set.
2. Variational Inference and Training
Learning in CompVAE proceeds via maximization of the ELBO (Evidence Lower Bound) using variational inference:
The inference model factorizes as: with the per-part inference for modeled by a fully-connected Graph Neural Network (GNN) architecture, ensuring permutation invariance and capturing correlations among part-latents. This structure allows the encoding of sets of arbitrary size and flexible correlation modeling.
The per-part multivariate Gaussian posterior for is parameterized such that the variance of their sum can be precisely controlled, which is crucial for stable and robust reconstruction. The KL-divergence term in the ELBO is analytic due to the Gaussian form of both factorized and posterior distributions.
3. Compositional Latent Operations and Invariance
The compositional property of CompVAE arises from the sum-aggregation of part-latents. For generation:
- Addition of elements: To include a new part, compute its latent and add it into .
- Removal of elements: To remove, simply omit the corresponding from the sum.
This affords post-training programmability; arbitrary operations on the sets of elements translate to interpretable manipulations in data space. The model's generative mechanism is provably invariant to the order of elements and flexible to the set size, supporting subsets or supersets of composition not seen during training.
4. Experimental Validation
CompVAE was empirically validated with synthetic benchmarks designed to test compositionality and reinforcement of invariance properties:
- 1D Synthetic Problem (Nonlinear Sine Aggregation): Each part is parameterized by frequency, amplitude, and phase, combined nonlinearly by summing sine curves and applying a scaled nonlinearity. CompVAE demonstrates that the per-part latents capture nearly all the information, and the global latent is used minimally, yielding smooth reconstructions under element addition/removal and generalization to set sizes beyond those observed during training.
- 2D Synthetic Problem (Colored Spots): Here, each part is a colored point placed in an image. CompVAE enables coherent generation under arbitrary addition or removal of spots, with latent representations robust to set size and composition.
In all cases, CompVAE achieves smooth, plausible reconstructions and supports operations like incremental part addition, showing strong information partition between per-part latents and the global interaction code. The model remains robust for scenarios where the number of elements during generation exceeds the training range.
5. Mathematical Formulations and Theoretical Properties
The key mathematical components underpinning CompVAE are:
- Generative factorization: explicit partitioning of information into local (per-part) and global (interaction) components.
- ELBO Loss: tractable analytic computation from the parameterization of posteriors as (multivariate) Gaussians.
- Aggregation invariance: the sum operation confers permutation invariance and generalization to varying set sizes.
- GNN-based Inference: Message-passing enables modeling of inter-part relationships while maintaining order- and size-invariance.
These structures collectively support the expressive, compositional, and reprogrammable properties desired in applications involving complex set- or group-structured data.
6. Significance and Use Cases
CompVAE provides a compositional generative approach especially suited for domains where instances are built from sets of elements that can be flexibly combined, such as:
- Simulation and control: where scenarios can be parametrically constructed from collections of abstract items.
- Vision and graphics: for object-based composition or de-composition.
- Energy management and similar aggregate modeling: e.g., aggregating curves for households with variable compositions.
Two major features distinguish CompVAE: robust invariance to the order and number of components, and the capacity for controlled, interpretable generation by latent modifications. These properties are empirically validated in the synthetic experiments reported.
7. Summary Table: Key Properties of CompVAE
| Property | CompVAE Implementation |
|---|---|
| Data structure handled | Sets of elements (multi-ensemblist) |
| Compositionality | Yes (add/remove parts in latent) |
| Order invariance | Yes (sum-aggregation) |
| Size robustness | Generalizes to unseen set sizes |
| Division of information | Per-part and global latent codes |
| Inference architecture | Permutation-invariant GNN |
CompVAE thus constitutes a theoretically principled and practically validated VAE extension for compositional, programmable, and invariant generative modeling of set-structured data, enabling novel forms of controllable synthesis and simulation beyond those accessible to standard VAEs (Berger et al., 2020).