Object-Centric Program Synthesis
- Object-centric program synthesis is a paradigm that converts high-level specifications into programs with objects as core abstractions, emphasizing modularity and interpretability.
- It leverages methodologies such as inductive logic programming and LLM-driven chain-of-thought decomposition to systematically generate and refine object models.
- The approach finds applications in symbolic reasoning, scene synthesis, and interactive design, with evaluations focusing on coverage, user retention, and semantic fidelity.
Object-centric program synthesis refers to the automatic generation of programs whose principal abstractions are objects and their relations, as opposed to flat, pixel-wise, or low-level representations. Systems in this paradigm operate by mapping high-level specifications—often natural language descriptions or structured examples—into interpretable programs within object-centric domain-specific languages (DSLs). The resultant programs manipulate, generate, or reason about objects and object relations, supporting applications in symbolic reasoning, scene synthesis, and model-driven software engineering.
1. Foundations of Object-Centric Abstraction
Object-centric synthesis systems rely on expressive DSLs that expose objects as first-class primitives, encapsulating both their geometric, semantic, or logical properties, as well as relations governing interactions between these objects.
For symbolic reasoning tasks, one approach defines primitives such as Point(x:int, y:int, color:C), Line(...), and Rectangle(...), with associated predicates for object construction and manipulation (e.g., translate, copy, line_from_point). Relation signatures and typed predicate declarations enable both compositionality and strict type checking, facilitating modular program induction (Rocha et al., 10 May 2024).
In geometric or spatial domains, embedded-Python DSLs allow specification of scenes as collections of object declarations (e.g., Object(desc, w, d, h, ...)) and relation clauses (adjacent, on, next_to_wall), with explicit variables for positions, orientations, and categories. This enables direct synthesis of programs describing both the existence and spatial arrangement of objects (Aguina-Kang et al., 5 Feb 2024).
Object-centric modeling is also leveraged in interactive software design tools, where object models (nodes and fields connected by edges) are synthesized from natural-language specifications, with LLMs decomposing the high-level description into object names, attributes, types, multiplicities, and methods (Gu et al., 2022).
2. Program Synthesis Methodologies
Distinct methodologies have emerged for object-centric program synthesis, reflecting both symbolic and neural approaches:
- Inductive Logic Programming (ILP): Logic-based systems cast synthesis as learning Horn clause definitions for relation predicates, using small numbers of input-output example pairs. The DSL background knowledge encodes object abstractions, while the synthesis procedure performs top-down specialization, scoring clause candidates by positive/negative coverage and enforcing unification across examples to avoid overfitting (Rocha et al., 10 May 2024).
- LLM-Orchestrated DSL Generation: LLMs, through prompt-engineered multi-stage pipelines, translate natural-language scene or object descriptions into declarative object-centric DSL code. Validity is ensured by running (and correcting) generated code within a Python interpreter, while multi-stage refinement improves semantic fidelity (Aguina-Kang et al., 5 Feb 2024).
- Chain-of-Thought Decomposition: Specification reification is operationalized by decomposing natural-language input into subtasks (object extraction, attribute inference, method generation) handled via sequenced LLM calls, each producing structured outputs contributing to an incremental object model (Gu et al., 2022).
3. Object-Centric Synthesis Pipelines
A general object-centric program synthesis pipeline involves the following phases:
- Specification Acquisition: The system receives a high-level specification, such as a set of input-output grid pairs (for ARC), a textual prompt (for indoor scenes), or an application description (for object models).
- Object and Relation Extraction: Candidate objects and possible relations are extracted either from structured examples (symbolic domains) or via LLM parsing and expansion (natural-language domains).
- Program Induction or Generation:
- In ILP-based settings, logic programs (sets of Horn clauses) defining object-generating or transforming relations are induced, subject to constraints on example coverage, type unification, and overlap prevention.
- In LLM-driven systems, object and relation declarations are constructed stagewise using prompt-engineered completions and code synthesis.
- Program Application and Deductive Search: Learned relations or synthesized DSL snippets are applied to generate new objects in the output space. In symbolic program synthesis, a sequential ordering (possibly via beam search) determines how relation rules are instantiated without object conflicts (Rocha et al., 10 May 2024). In scene generation, programs are executed to set up a constraint satisfaction problem whose solution yields feasible object layouts (Aguina-Kang et al., 5 Feb 2024).
- Refinement and Validation: Generated object models may be subject to interactive user refinement (object model synthesis), or to automated validity checks (scene generation), including type errors, constraint satisfaction, and semantic alignment.
4. Representative Systems and Applications
ILP for ARC
An ILP-based system for the Abstraction and Reasoning Corpus employs a minimal object-centric DSL and induces logic programs mapping from input grids to output grids by learning object relations from pairs of examples. The system generalizes from very few training pairs by requiring rule unification across them and leverages the object-centric representation to reduce the combinatorial search space relative to pixel-level encodings. Induced rules are interpretable and can encode diverse forms of reasoning, although the primitive set limits the expressive power and recursion/general looping are unsupported (Rocha et al., 10 May 2024).
ObSynth: Object Model Synthesis from Natural Language
ObSynth is an interactive environment for object model synthesis, decomposing high-level user prompts into object graphs (objects, fields, methods) via LLM-driven, prompt-engineered subtasks. The system supports iterative refinement and demonstrates, via user paper, that synthesized models exhibit more comprehensive object structures (i.e., inclusion of fields or objects that users typically omit), without reducing time-to-completion. Specification reification is formalized as a key innovation, shifting object synthesis from local snippet completion to global model inference (Gu et al., 2022).
Scene Generation with LLM-Synthesized Programs
A system for open-universe indoor scene generation accepts arbitrary text prompts, synthesizes Pythonic DSL programs defining object instances and spatial relations, and solves for valid layouts via a differentiable constraint satisfaction formulation. Object geometry is supplied by vision-LLM-based mesh retrieval and orientation pipelines, supporting scenes and object classes never seen in training. Experiments demonstrate state-of-the-art performance in both closed- and open-universe settings compared to prior generative systems (Aguina-Kang et al., 5 Feb 2024).
5. Evaluation Protocols and Results
Object-centric synthesis systems are evaluated on both quantitative and qualitative criteria specific to their target domains:
- Coverage and Generalization: ILP systems assess positive/negative coverage of induced clauses, retention of learned programs across test instances, and the ability to reconstruct output grids from minimal examples. Generalization is supported by rule unification and modular clause definitions (Rocha et al., 10 May 2024).
- Model Completeness and User Retention: In object model synthesis, metrics include the percentage of suggested objects, fields, and methods retained by users, object/field coverage relative to controls, model size, and time-to-completion. ObSynth demonstrates greater model detail and completeness, with 92% of objects and 79% of fields retained, but similar synthesis times relative to manual design (Gu et al., 2022).
- Perceptual Quality and Semantic Fidelity: For scene generation, pairwise preference studies capture perceptual realism, semantic alignment, and style relevance. Ablation analysis validates the contribution of multi-stage LLM prompting, VLM+LLM filtering for mesh selection, and multi-step object orientation procedures (Aguina-Kang et al., 5 Feb 2024).
6. Limitations and Prospective Directions
Current object-centric program synthesis systems present several limitations:
- Restricted DSL Expressivity: Many DSLs are manually constructed, with limited sets of primitives and lacking higher-order constructs (e.g., loops, recursion), confining the generative capacity of induced programs (Rocha et al., 10 May 2024).
- Negative Example Explosion: In logic program induction, negative examples generated by candidate enumeration can induce combinatorial blowup for large object spaces (Rocha et al., 10 May 2024).
- Beam Search Ordering: Deductive search for the correct composition of object relations may miss valid generation sequences due to fixed depth or naive heuristics (Rocha et al., 10 May 2024).
- LLM Over/Undergeneration: In LLM-driven synthesis, greedy decoding may result in incomplete or extraneous model elements, with limited semantic checking beyond type-level validation (Gu et al., 2022).
Potential extensions include data-driven or metainterpretive DSL expansion, integration of higher-order relational constructs, probabilistic scoring to balance complexity and coverage, symbolic verification of semantic consistency, and advanced pruning of the search space via type-based or symmetry-breaking strategies (Rocha et al., 10 May 2024, Gu et al., 2022, Aguina-Kang et al., 5 Feb 2024).
7. Connections to Related Paradigms
Object-centric program synthesis departs from traditional neural program synthesis, which is often limited to flat or local code completion (e.g., function generation by Codex, Copilot, AlphaCode). In contrast, object-centric systems address global specification reification—mapping entire requirements or complex scene descriptions into structured object programs. This paradigm aligns with both symbolic AI methodologies (ILP, constraint programming) and current trends in LLM-augmented program synthesis, leveraging both interpretability and scalability. Specification reification and the chain-of-thought decomposition of model inference are distinguishing contributions in this field (Gu et al., 2022, Aguina-Kang et al., 5 Feb 2024).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free