Geometric & Symbolic Scene Graphs

Updated 26 February 2026

Geometric and symbolic scene graphs are hybrid representations that integrate explicit spatial metrics with semantic labels to support robust scene analysis.
They enable spatial reasoning, object arrangement inference, and multimodal synthesis across robotics, computer vision, and graphics applications.
Advanced methods use iterative graph neural networks and probabilistic inference to achieve interpretable, scalable, and semantically-rich scene generation.

Geometric and symbolic scene graphs constitute hybrid structured representations that integrate explicit spatial (metric, topological) information with high-level semantic and relational knowledge. These models support spatial reasoning, object arrangement inference, embodied AI, scene generation, and robust multimodal understanding across domains such as robotics, computer vision, and graphics. Advances in this area reflect a maturation from purely appearance-based models to unified graph-theoretic approaches encoding both geometry and domain-level symbolism.

1. Fundamental Definitions and Data Structures

A geometric scene graph is a labeled, attributed graph $G=(V,E)$ , where nodes represent physical entities (objects, regions, frames) and maintain geometric descriptors such as 3D position $p_i \in \mathbb{R}^3$ , bounding volumes, and, optionally, orientation or full pose (e.g., $(R_v \in SO(3), t_v \in \mathbb{R}^3)$ ) (Agia et al., 2022, Zhu et al., 2020, Ruiz et al., 18 Nov 2025, Günther et al., 3 Feb 2026). Edge types encode spatial relations—metric (Euclidean adjacency or containment), topological (navigability/connectivity), or derived predicates like “on-top-of”, “next-to”, or “part-of” (Aryan et al., 2024, Gay et al., 2018, Agia et al., 2022).

Symbolic scene graphs generalize this structure by adding discrete semantic labels, logical predicates, domain ontologies, or knowledge-graph-style links. Nodes are associated with object-class labels or symbolic concepts (dogs, tables, ‘kitchen’), and edges annotate explicit symbolic relations (e.g., “part-of,” “affordance,” “supports”) (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020). The symbolic and geometric branches can be integrated via cross-links (e.g., object nodes in both graphs, mappings from geometry to symbolic predicates via deterministic or learned functions) (Zhu et al., 2020, Aryan et al., 2024, Günther et al., 3 Feb 2026, Saucedo et al., 5 May 2025).

Expanded frameworks additionally store probability distributions over latent or unobserved entities (belief scene graphs), hierarchical region nodes (building–floor–room–object), temporal evolution (spatio-temporal scene graphs), or behavior/action vector fields (Saucedo et al., 5 May 2025, Werby et al., 1 Oct 2025, Huang et al., 2023, Kamarianakis et al., 2023).

2. Algorithmic Inference and Joint Reasoning

Contemporary hybrid methods perform joint inference or search over merged geometric-symbolic graphs. One paradigm employs a merged graph $M=(V,E)$ over scene (object) nodes, domain-knowledge nodes, and spatial/symbolic edges. The objective is to extract a maximally compatible active subgraph $M^A$ by maximizing an energy/compatibility function: $\text{Score}(M^A) = \sum_{(i,j)\in E_s^A} w_{ij} f_{ij}(x_i,x_j) + \sum_{(u,v) \in E_k^A} g_{uv}(z_u,z_v)$ where $f_{ij}$ scores geometric-spatial compatibility and $g_{uv}$ scores symbolic agreement or affordance (Aryan et al., 2024).

A dynamic iterative expansion algorithm is employed, with states updated by learned message-passing (GNN-style) propagators, importance scoring for expansion, and classification output for compound-scene recognition (Aryan et al., 2024). The process is interpretable, with intermediate activations denoting which spatial relations or symbolic facts influence the result.

In probabilistic and neuro-symbolic variants, node states or geometric locations are distributions over scenes or possible placements (e.g., P(position|graph)). These distributions are inferred by GCN-based message passing or factor-graph decomposition, optionally regularized with ontology constraints (room-object priors via LLMs) (Saucedo et al., 5 May 2025). Solutions for dynamic symbolic scene manipulation and geometric construction (as in GeoSketch) employ an explicit perception–reasoning–action loop over the evolving logic-form scene graph (Weng et al., 26 Sep 2025).

3. Geometric Representation and Feature Encoding

Modern geometric scene graphs encode geometric content as high-dimensional features, beyond scalar positions:

Node features: 3D coordinates, size (bounding box dimensions), orientation (yaw, quaternion, full $SO(3)$ rotation), class-embeddings (learned or one-hot categorical), mesh codes (e.g., VAE-compressed OpenCLIP/AtlasNet embeddings), and motor multivectors (in geometric algebra approaches) (Ruiz et al., 18 Nov 2025, Kamarianakis et al., 2023, Kamarianakis et al., 19 Nov 2025).
Edge features: Relative displacements, rotation/scale differences, geometric relations (seat “on-top-of” chair), and learned embeddings of relation predicates encoded as feature vectors, e.g., $e_{ij}=Enc_r(r_{ij}) \in \mathbb{R}^k$ (Aryan et al., 2024, Ruiz et al., 18 Nov 2025).
Probabilistic: In belief scene graphs, nodes maintain spatial distributions (soft location heatmaps) representing uncertainty about unobserved object placement (Saucedo et al., 5 May 2025).

Hierarchical structures are often maintained, with nodes for buildings, floors, rooms, objects, and functional elements (e.g., control panels), supporting multi-scale reasoning (Werby et al., 1 Oct 2025, Agia et al., 2022).

For graphics and XR, geometric algebra provides a common operator algebra for points, transforms, and higher-grade geometric entities, encapsulating rotations, translations, dilation, and constraints in the multivector representation of scene-graph nodes and edges (Kamarianakis et al., 19 Nov 2025, Kamarianakis et al., 2023).

4. Symbolic and Commonsense Reasoning Mechanisms

The symbolic graph component can be built from hand-crafted knowledge graphs, extracted ontologies, or learned relation graphs:

Symbolic knowledge graphs $K=(V_k,E_k)$ hold object/scene concepts and relation edges (“part_of”, “affordance”, “co-occurrence”) endowed with weights or compatibility functions, supporting domain-specific composition (Aryan et al., 2024, Saucedo et al., 5 May 2025).
Room-object spatial ontologies are induced via LLM queries (e.g., “Is a stove found in a kitchen?”), with symbolic bipartite graphs between rooms and object types, enforcing coarse commonsense priors in geometric learning (Saucedo et al., 5 May 2025).
Explicit logic forms, Datalog rules, or temporal logic (LTL $_n$ ) formulae are extracted from text/caption and used to impose high-level integrity or temporal constraints (e.g., “always(left_of( $v_1$ , $v_2$ )) ⇒ x_1 < x_2”) (Huang et al., 2023, Weng et al., 26 Sep 2025).
Symbol mapping functions can deterministically or probabilistically map metric data to high-level predicates, producing synthesized symbolic scene graphs from geometric observations (Zhu et al., 2020).

Message passing, symbolic inference, or contrastive-alignment is used to integrate these constraints into the GNN-based learning process (Aryan et al., 2024, Saucedo et al., 5 May 2025, Huang et al., 2023).

5. Applications and Empirical Validation

Geometric and symbolic scene graphs underpin various tasks:

Scene understanding: Compound-scene classification via merged scene-knowledge graph search (e.g., kitchen recognition) with interpretable outputs (Aryan et al., 2024).
Generative scene synthesis: Probabilistic priors over furniture placement and orientation (SceneGen), diffusion models for 3D geometry synthesis from text (GeoSceneGraph), expressive image generation from full symbolic-geometric descriptors (“Image Generation from Scene Graphs”) (Keshavarzi et al., 2020, Ruiz et al., 18 Nov 2025, Johnson et al., 2018).
Robot planning and task/motion synthesis: Efficient task-and-motion planning over large 3D scene graphs, exploiting hierarchy and minimal subgraph extraction (SCRUB, SEEK) (Agia et al., 2022, Zhu et al., 2020). Hierarchical planners compose symbolic reasoning (subgoal regression) and geometric motion inference for robust long-horizon manipulation (Zhu et al., 2020).
Navigation and mapping: Real-time, incremental, open-set semantic mapping directly on the 3D scene graph backbone, integrating raw sensor streams (RGB-D, masks, CLIP descriptors) with symbolic/topological structure. Supports open-world queries, export to knowledge graphs, and fast spatial reasoning (Günther et al., 3 Feb 2026, Seymour et al., 2022).
Spatial commonsense: Estimating spatial distributions of unseen or missing objects, generating full room layouts from sparse observations, and integrating high-level priors for scene composition in both simulation and field robotics (Saucedo et al., 5 May 2025).
Geometric problem solving and diagrammatic inference: Geometry education and assistant agents dynamically manipulate a scene graph with precise construction and transformation steps, using explicit symbolic and geometric updates, with RL-driven robustness (Weng et al., 26 Sep 2025).

Empirical evaluations employ metrics for classification accuracy, geometric layout distance (e.g., Wasserstein, Frobenius), stepwise reasoning correctness, planning efficiency, navigation success rate, and user plausibility ratings (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020, Keshavarzi et al., 2020).

6. Strengths, Limitations, and Future Directions

Integrated geometric and symbolic scene graphs offer structured, scalable, and interpretable frameworks that bridge sub-symbolic sensor data with high-level reasoning. Key strengths include transparent intermediate activations, joint learning of spatial and symbolic structure, and the ability to ground queries, generative synthesis, and planning within one unified data structure (Aryan et al., 2024, Saucedo et al., 5 May 2025, Günther et al., 3 Feb 2026).

However, several open challenges remain:

Quality and coverage of object detection and relation prediction can bottleneck downstream reasoning, as errors propagate through the unified graph (Aryan et al., 2024).
Many systems require curated, domain-specific knowledge graphs or annotated spatial ontologies; brittleness to missing or inconsistent knowledge has been documented (Aryan et al., 2024, Saucedo et al., 5 May 2025).
Scalability to hundreds of classes, dense open-world scenes, or multi-agent, long-horizon tasks introduces architectural and computational constraints, especially for graph storage and neural message-passing (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020).
End-to-end learning of symbolic predicates, open-vocabulary relation extraction, and zero-shot generalization to novel object types or relation schemas remain active areas of research (Huang et al., 2023, Günther et al., 3 Feb 2026).

A plausible implication is that continued advances in scalable geometric algebra frameworks, differentiable logic engines, and neurally-initialized symbolic graphs will further consolidate this paradigm, moving towards generalizable, interpretable, and highly capable scene understanding and generation systems.

References:

(Aryan et al., 2024) Symbolic Graph Inference for Compound Scene Understanding
(Saucedo et al., 5 May 2025) Estimating Commonsense Scene Composition on Belief Scene Graphs
(Ruiz et al., 18 Nov 2025) GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis
(Agia et al., 2022) TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs
(Kamarianakis et al., 19 Nov 2025) One algebra for all : Geometric Algebra methods for neurosymbolic XR scene authoring, animation and neural rendering
(Kamarianakis et al., 2023) UniSG^GA: A 3D scenegraph powered by Geometric Algebra unifying geometry, behavior and GNNs towards generative AI
(Johnson et al., 2018) Image Generation from Scene Graphs
(Gay et al., 2018) Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning
(Werby et al., 1 Oct 2025) KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
(Weng et al., 26 Sep 2025) GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation
(Huang et al., 2023) LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
(Günther et al., 3 Feb 2026) A Scene Graph Backed Approach to Open Set Semantic Mapping
(Zhu et al., 2020) Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs
(Seymour et al., 2022) GraphMapper: Efficient Visual Navigation by Scene Graph Generation
(Keshavarzi et al., 2020) SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors