Compositional Program Generator (CPG)

Updated 21 November 2025

CPG is a computational framework that leverages compositional structure, modular abstraction, and operator-level decomposition to synthesize and generalize programs.
It employs diverse techniques including neuro-symbolic parsing, Transformer-based masking, and LLM-guided recursive decomposition to enhance systematic and productive generalization.
Empirical validations show high efficiency in tasks like SCAN and PBE, addressing key design trade-offs through strategies such as parameter isolation and explicit subtask separation.

A Compositional Program Generator (CPG) is a computational architecture or algorithmic system designed to synthesize, generalize, or generate programs by explicitly leveraging compositional structure, modular abstraction, or operator-level decomposition. CPGs are deployed in a variety of settings, including program synthesis from examples, few-shot systematic generalization, and automated code or domain-specific program generation. Core approaches span neuro-symbolic parsing with modular parameterization, LLM-guided recursive decomposition, Transformer-based masking for decomposition, and stacks of domain-specific languages with compositional rewrite rules.

1. Formal Problem Setting and Objectives

CPGs arise within the broader contexts of program synthesis and sequence-to-sequence (seq2seq) mapping. Formally, for most CPG frameworks, a task consists of producing a target output $y$ given an input $x$ , where $y$ is either a program (in a programming language or DSL) or the result of executing some program (e.g., string transformation, logical form, or action sequence). CPGs are fundamentally motivated by two desiderata:

Systematic Generalization: Ability to recombine previously learned behaviors across new structural patterns—e.g., handling “walk left after run twice” after seeing only “walk left after jump twice” and “run twice after jump left.”
Productive Generalization: Ability to handle compositions of unbounded length or structure, even under few-shot or zero-shot regimes.

Given a set of input-output examples $E = \{(x_i, y_i)\} \subset V \times V$ , with $V$ a universe of concrete values, the CPG seeks a program $F \in L$ such that $\llbracket F \rrbracket(x) = y$ for all $(x, y) \in E$ , where $\llbracket F \rrbracket$ denotes the semantics of $F$ in the target language $L$ (Khan et al., 12 Mar 2025, Klinger et al., 2023).

2. Core Paradigms in CPG Architectures

CPGs are instantiated under diverse computational paradigms, unified by decompositional and modular design:

Neuro-symbolic Modular Decomposition: The grammar-based CPG (Klinger et al., 2023) employs a context-free grammar (CFG) parser which decomposes the input $x$ into a parse tree $\tau$ , with each production rule $r$ of the CFG associated with a distinct learnable “semantic module” $M_r$ . Modules are either copy-programs (reordering/repeating child outputs) or substitution-programs (assigning slots to objects), parameterized by private weights $\theta_r$ . During inference, CPG composes modules bottom-up on the parse tree, ensuring that the same grammar rule always executes the same module, thereby achieving systematic and productive generalization.
Transformer-based Explicit Decomposition: Masked self-attention and separator tokens have been incorporated in Transformer-based models to enforce explicit subtask boundaries and modular decoding. Here, a target sequence is annotated with SEPARATOR (SEP) tokens, and attention masks are constructed such that generation of each subprogram attends only to its local context—mirroring compositional decomposition (Shi et al., 2022).
LLM-Guided Recursive Program Decomposition: In a Programming-by-Example (PBE) context, CPG methods invoke an LLM to synthesize candidate programs, but upon failure, recover by automatically decomposing the task into simpler subtasks (e.g., via prefix/postfix program slicing, conditional branching, or operator inversion), recursively synthesizing and verifying subprograms which are then composed to solve the original task (Khan et al., 12 Mar 2025).
Multi-layer DSL Lowering: For high-performance numerical code generation, compositionality is achieved via a stack of domain-specific languages (DSLs), each equipped with rewrite and partitioning rules to lower from high-level symbolic mathematical expressions progressively down to concrete, architecture-specific code, e.g., for linear algebra routines (Spampinato et al., 2019).

3. Decomposition and Composition Strategies

Decomposition is the defining operation in any CPG:

Syntactic Decomposition: Parsing inputs using a CFG (or more general formalism) and associating grammar rules with distinct modules, which can be composed to cover arbitrarily complex structures. This guarantees that the same module solves all instances with the same abstract syntactic shape (Klinger et al., 2023).
Subtask Boundary Enforcement: In Transformer-style models, explicit SEP tokens and carefully designed attention masks localize subprogram generation, preventing spurious cross-part attention and thereby aligning neural inference with compositional principles. Empirical measurements confirm that zero-shot generalization to novel compositions is significantly improved when such decomposition is enforced (Shi et al., 2022).
Recursive AST Decomposition in PBE: LLM-based CPGs attempt to exploit the structure of candidate (often incorrect) programs by partitioning the AST via prefix (first operator), suffix (last operator), or attempting conditional branching on distinct example classes. Each decomposition induces a new subinstance of the synthesis problem, enabling the model to recursively break down intractable tasks (Khan et al., 12 Mar 2025).
Module Isolation and Freezing: Modular parameterization (e.g., separate $\theta_r$ per grammar rule) combined with incremental “freezing”—where parameters associated with existing rules are held fixed when new rules are encountered—prevents catastrophic forgetting and enables stable few-shot extension to novel program structures (Klinger et al., 2023). Ablation studies indicate that removing this feature results in dramatic degradation of compositional generalization.

4. Empirical Validation and Benchmarks

CPGs have been evaluated on benchmarks explicitly designed to probe axes of compositionality:

Few-Shot Generalization in SCAN and COGS: The modular grammar-based CPG attains 100% exact-match generalization on both the “length” and “add-jump” splits of SCAN with just 14 training examples, and on the compositional generalization split of COGS with only 22 examples. Standard Transformer baselines fail under these conditions (typically requiring thousands of examples), highlighting the advantage of grammar-driven modularity (Klinger et al., 2023).
Zero-Shot and Few-Shot Accuracy in RobustFill: Compositional splits such as length-expansion, concept composition, and concept order switching reveal substantial improvements with Transformer models leveraging explicit decomposition (e.g., jump from ~30% to 74.4% in “compose different concepts” axis on SCAN), but also expose residual challenges, notably in concept order switching and length generalization. This suggests that while decomposition aids systematicity, certain axes of compositionality remain elusive for current neural approaches (Shi et al., 2022).
LLM-Guided CPG Performance in PBE: Across hard string PBE tasks, recursive decomposition strategies (ForwardAll, Forward1, Backward1, IfThenElse) implemented via an LLM-based CPG recover approximately 30% of tasks not solvable by self-reflection or single-call LLM approaches. Each decomposition strategy covers a distinct partition of the benchmark, reinforcing the efficacy of operator-level compositional prompting. Table: summary for Python tasks (playgol-py):

Strategy	# Tasks Solved	% of 665
ForwardAll	115	17.3%
Forward1	86	12.9%
Backward1	82	12.3%
Any strategy	210	31.6%

(Khan et al., 12 Mar 2025)

High-Performance Linear Algebra Generation: In multi-DSL CPGs for numerical code, compositionality yields auto-generated routines that match or exceed hand-tuned libraries such as MKL, as shown for Cholesky, Lyapunov, and Sylvester problems (Spampinato et al., 2019).

5. Design Trade-Offs and Algorithmic Considerations

Each CPG architecture imposes trade-offs reflecting the mechanisms for decomposition, modularity, and parameter sharing:

Grammar-Guided Modularity: Ensures programmatic systematicity but depends critically on the precision and granularity of the chosen grammar. Highly fine-grained grammars can lead to proliferation of modules and increased learning complexity; overly coarse grammars can sacrifice generalization.
Masking in Neural Decoders: Explicit attention masking prevents information leakage across subtasks, at the cost of reduced global context and potential difficulties in handling cross-subtask dependencies (Shi et al., 2022).
LLM-based Recursive Synthesis: Allows for recovery from failed attempts but can incur increased computational cost due to repeated LLM invocations and verification steps. The strategy ordering (IfThenElse → ForwardAll → Forward1 → Backward1) reflects empirical trade-offs between coverage and reliability, with certain decompositions more likely to yield tractable subtasks (Khan et al., 12 Mar 2025).
Separation of Subroutines (Library-Augmented Decoding): Storing canonical subroutines and using them at decode time simulates modular composition, but may require efficient retrieval and subroutine selection mechanisms.
Incremental Freezing and Parameter Isolation: Prevents interference between rules/modules but can hinder global optimization; the curriculum is essential for maintaining both backward compatibility and extension to novel structures (Klinger et al., 2023).

6. Implications, Extensions, and Open Challenges

CPGs demonstrate that compositional decomposition—whether via grammar parsing, attention masking, or AST-level operator slicing—substantially improves systematic and productive generalization in program generation and synthesis. Practical implications include:

Automatic Synthesis Beyond LLM Self-Reflection: Operator-level decomposition and modular recombination can extend LLM-based synthesis capabilities well beyond single-pass or reflection-based methods.
Sample Efficiency and Robustness: Explicit modularization yields massive reductions in sample complexity for compositional generalization, as seen in neuro-symbolic CPGs.
Pretraining and Data Augmentation: Exposure to a diverse set of compositional structures (cross-concept, cross-order, and lengthwise) during pretraining improves robustness and systematicity (Shi et al., 2022).
Cross-Domain Generality: Linking string transformation, logical form, navigation sequence, and numerical code domains under the CPG umbrella invites research into unified, grammar- and module-oriented program generators.

Several open problems remain: handling latent (learned) subtask boundaries rather than explicit separators, developing planners to autonomously propose decompositions, incorporating retrieval-augmented or graph-structured decoding, and extending CPGs to highly open-ended or weakly structured program spaces.

Multiple research communities have converged on CPG principles under varying nomenclature and with different emphases:

Neuro-symbolic grammar-based CPGs (Brenden Lake, NYU; SCAN/COGS) demonstrated state-of-the-art few-shot generalization via explicit modularization (Klinger et al., 2023).
LLM-guided recursive CPGs (playgol, FlashFill-style string transformations) have expanded the reach of program synthesis from input-output examples by exploiting decompositional prompting and subproblem recursion (Khan et al., 12 Mar 2025).
Neural program synthesis evaluation (Shi et al.) and Transformer decomposition benchmarks elucidated the critical axes along which compositionality is measured, and how explicit subtask boundaries and attention mechanisms support or hinder various forms of generalization (Shi et al., 2022).
Multi-DSL code generators for linear algebra exemplify end-to-end compositional lowering from mathematical specifications to architecture-specific kernels (Spampinato et al., 2019).

In summary, CPG frameworks—from grammar-driven modular interpreters to LLM-directed recursive decomposers—are central to advancing systematic, sample-efficient, and scalable program generation across a range of domains.