RTG-AE: Recursive Tree Grammar Autoencoders
- Recursive Tree Grammar Autoencoders are models that encode tree-structured data into continuous latent spaces and decode them with strict grammar constraints ensuring syntactic validity.
- They employ a bottom-up recursive neural encoder to generate unique embeddings and a grammar-constrained top-down decoder that guarantees valid production sequences.
- Empirical evaluations show RTG-AE's linear-time operations, low reconstruction error, and superior performance in applications like molecular design and code synthesis.
Recursive Tree Grammar Autoencoders (RTG-AE) constitute a class of models designed to map tree-structured data into continuous latent spaces and then reconstruct the original tree through grammatically valid generative processes. RTG-AEs integrate three key elements: explicit regular tree grammar constraints, recursive neural processing for both encoding and decoding, and variational autoencoding, realizing linear-time, expressive, and syntactically rigorous tree-to-tree autoencoding (Paassen et al., 2020). This approach is primarily motivated by applications where valid tree generation is essential, such as molecular design (SMILES), symbolic expression optimization, and code synthesis.
1. Formal Definitions and Preliminaries
An RTG-AE operates over ordered, labeled trees defined by a regular tree grammar (RTG) :
- : finite set of nonterminals
- : finite alphabet of terminal symbols
- : start symbols
- : set of production rules , where and
A syntactically valid tree in the language can always be uniquely parsed as a sequence of such grammar rule applications, provided is deterministic (no two rules have the same right-hand side). The RTG defines the entire set of trees representable by the model, and every encoded/decoded tree is guaranteed to belong to (Paassen et al., 2020, Paassen et al., 2020).
2. Encoder Architecture: Bottom-Up Recursive Neural Parsing
The RTG-AE encoder traverses the input tree in a bottom-up fashion, recursively mapping each node and its children to an embedding in using rule-specific neural functions: where is the rule generating the node, , and are learned parameters. For leaves (), reduces to a bias. The encoder constructs a single root embedding by recursively applying these in a manner precisely tied to grammar structure and node arity.
This approach differs fundamentally from sequence-based or bag-of-children representations by virtue of its strict adherence to the tree’s grammatical production sequence. Notably, encoder complexity is for trees of size , with a unique parse and embedding for each valid tree (Paassen et al., 2020, Paassen et al., 2020).
3. Decoder Design: Grammar-Constrained Top-Down Generation
The RTG-AE decoder is a recursive, grammar-controlled generative process. Given a latent vector and current nonterminal , the decoder:
- Computes logits for each rule with left-hand side .
- Samples a rule via softmax over the valid rules for .
- For the chosen production , generates child embeddings using rule-specific functions.
- Recursively emits the subtree by invoking the decoder on each .
The subtraction of child embeddings from parents (“explaining away”) is employed to encourage independence between branches. Due to strict grammar control at every generation step, the decoder only produces trees in . Decoding is linear in tree size (Paassen et al., 2020).
4. Variational Autoencoding Objective
RTG-AE employs a variational autoencoder framework, defining:
- Encoder , where is the bottom-up tree embedding.
- Latent code is mapped to the decoder’s initial embedding.
- Decoder likelihood is the product of softmax probabilities over the true sequence of grammar rules applied during generation.
The training objective maximizes the evidence lower bound (ELBO)
Training is performed end-to-end via stochastic gradient descent. The grammar constraints ensure invalid productions have zero likelihood (Paassen et al., 2020).
5. Theoretical Guarantees and Expressiveness
RTG-AE admits a number of theoretical properties:
- Linear-time encoding and decoding: both operations scale in tree size, a consequence of recursive dynamic programming and the bounded arity of grammar rules.
- Unique parsing: Determinism in the grammar ensures every tree in maps to a unique production sequence.
- Expressiveness: Any regular tree language can be encoded by a deterministic RTG, thus all such tree languages are representable by RTG-AE (Paassen et al., 2020).
6. Empirical Evaluation and Comparative Study
RTG-AE has been benchmarked against models ablated for recursion, grammar, or VAEs, including D-VAE (graph VAE, [Zhang et al. 2019]), GVAE (Grammar VAE, (Kusner et al., 2017)), TES-AE (Tree Echo State AE, (Paassen et al., 2020)), and other sequenced-based or grammar-based baselines. Key findings (Paassen et al., 2020):
- RTG-AE achieved the lowest RMSE tree-edit-distance in 3/4 datasets, with, e.g., 0.83 RMSE on Boolean expressions compared to next-best 1.98.
- Training time is 30–50% shorter than the closest recursive baseline.
- Downstream optimization (e.g., optimizing SMILES for chemical properties via CMA-ES) yielded higher median scores and a greater fraction of syntactically valid molecules (up to 37.3% valid from prior samples versus 16.2% in ablated variants).
- Ablation study supports the necessity of all three ingredients—variance, grammar, and recursion—for optimal performance.
An example application is molecular design: given a molecular tree, encode it, perform optimization in latent space, then decode to new, guaranteed-valid molecules with improved scores (Paassen et al., 2020).
7. Related Models and Further Developments
RTG-AEs generalize and refine previous grammar-constrained generative models. GVAE (Kusner et al., 2017) integrates grammar masks in a sequential (string) VAE, but lacks recursive, structure-aware processing of trees. TES-AE (Paassen et al., 2020) employs unordered echo-state (reservoir) networks and SVM readouts, achieving fast grammar-respecting autoencoding without a variational objective. Recent extensions such as Recursive Neural Programs (RNPs) (Fisher et al., 2022) realize RTG-AE principles with neural modules for hierarchically compositional visual data and differentiable part-whole grammar of images.
Current limitations include handling only discrete node labels, shallow reservoir structures, and limited invertibility in certain decoder transitions. Open directions include grammar-guided generative modeling of time series on trees, higher-capacity recursive architectures, and direct gradient-based tuning of readouts (Paassen et al., 2020, Paassen et al., 2020).
RTG-AEs are a principled and theoretically grounded approach to grammar-constrained, recursive variational autoencoding for trees, demonstrating advantages in validity, efficiency, and optimization performance in complex structured-data domains (Paassen et al., 2020).