Papers
Topics
Authors
Recent
Search
2000 character limit reached

Geometric & Symbolic Scene Graphs

Updated 26 February 2026
  • Geometric and symbolic scene graphs are hybrid representations that integrate explicit spatial metrics with semantic labels to support robust scene analysis.
  • They enable spatial reasoning, object arrangement inference, and multimodal synthesis across robotics, computer vision, and graphics applications.
  • Advanced methods use iterative graph neural networks and probabilistic inference to achieve interpretable, scalable, and semantically-rich scene generation.

Geometric and symbolic scene graphs constitute hybrid structured representations that integrate explicit spatial (metric, topological) information with high-level semantic and relational knowledge. These models support spatial reasoning, object arrangement inference, embodied AI, scene generation, and robust multimodal understanding across domains such as robotics, computer vision, and graphics. Advances in this area reflect a maturation from purely appearance-based models to unified graph-theoretic approaches encoding both geometry and domain-level symbolism.

1. Fundamental Definitions and Data Structures

A geometric scene graph is a labeled, attributed graph G=(V,E)G=(V,E), where nodes represent physical entities (objects, regions, frames) and maintain geometric descriptors such as 3D position piR3p_i \in \mathbb{R}^3, bounding volumes, and, optionally, orientation or full pose (e.g., (RvSO(3),tvR3)(R_v \in SO(3), t_v \in \mathbb{R}^3)) (Agia et al., 2022, Zhu et al., 2020, Ruiz et al., 18 Nov 2025, Günther et al., 3 Feb 2026). Edge types encode spatial relations—metric (Euclidean adjacency or containment), topological (navigability/connectivity), or derived predicates like “on-top-of”, “next-to”, or “part-of” (Aryan et al., 2024, Gay et al., 2018, Agia et al., 2022).

Symbolic scene graphs generalize this structure by adding discrete semantic labels, logical predicates, domain ontologies, or knowledge-graph-style links. Nodes are associated with object-class labels or symbolic concepts (dogs, tables, ‘kitchen’), and edges annotate explicit symbolic relations (e.g., “part-of,” “affordance,” “supports”) (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020). The symbolic and geometric branches can be integrated via cross-links (e.g., object nodes in both graphs, mappings from geometry to symbolic predicates via deterministic or learned functions) (Zhu et al., 2020, Aryan et al., 2024, Günther et al., 3 Feb 2026, Saucedo et al., 5 May 2025).

Expanded frameworks additionally store probability distributions over latent or unobserved entities (belief scene graphs), hierarchical region nodes (building–floor–room–object), temporal evolution (spatio-temporal scene graphs), or behavior/action vector fields (Saucedo et al., 5 May 2025, Werby et al., 1 Oct 2025, Huang et al., 2023, Kamarianakis et al., 2023).

2. Algorithmic Inference and Joint Reasoning

Contemporary hybrid methods perform joint inference or search over merged geometric-symbolic graphs. One paradigm employs a merged graph M=(V,E)M=(V,E) over scene (object) nodes, domain-knowledge nodes, and spatial/symbolic edges. The objective is to extract a maximally compatible active subgraph MAM^A by maximizing an energy/compatibility function: Score(MA)=(i,j)EsAwijfij(xi,xj)+(u,v)EkAguv(zu,zv)\text{Score}(M^A) = \sum_{(i,j)\in E_s^A} w_{ij} f_{ij}(x_i,x_j) + \sum_{(u,v) \in E_k^A} g_{uv}(z_u,z_v) where fijf_{ij} scores geometric-spatial compatibility and guvg_{uv} scores symbolic agreement or affordance (Aryan et al., 2024).

A dynamic iterative expansion algorithm is employed, with states updated by learned message-passing (GNN-style) propagators, importance scoring for expansion, and classification output for compound-scene recognition (Aryan et al., 2024). The process is interpretable, with intermediate activations denoting which spatial relations or symbolic facts influence the result.

In probabilistic and neuro-symbolic variants, node states or geometric locations are distributions over scenes or possible placements (e.g., P(position|graph)). These distributions are inferred by GCN-based message passing or factor-graph decomposition, optionally regularized with ontology constraints (room-object priors via LLMs) (Saucedo et al., 5 May 2025). Solutions for dynamic symbolic scene manipulation and geometric construction (as in GeoSketch) employ an explicit perception–reasoning–action loop over the evolving logic-form scene graph (Weng et al., 26 Sep 2025).

3. Geometric Representation and Feature Encoding

Modern geometric scene graphs encode geometric content as high-dimensional features, beyond scalar positions:

  • Node features: 3D coordinates, size (bounding box dimensions), orientation (yaw, quaternion, full SO(3)SO(3) rotation), class-embeddings (learned or one-hot categorical), mesh codes (e.g., VAE-compressed OpenCLIP/AtlasNet embeddings), and motor multivectors (in geometric algebra approaches) (Ruiz et al., 18 Nov 2025, Kamarianakis et al., 2023, Kamarianakis et al., 19 Nov 2025).
  • Edge features: Relative displacements, rotation/scale differences, geometric relations (seat “on-top-of” chair), and learned embeddings of relation predicates encoded as feature vectors, e.g., eij=Encr(rij)Rke_{ij}=Enc_r(r_{ij}) \in \mathbb{R}^k (Aryan et al., 2024, Ruiz et al., 18 Nov 2025).
  • Probabilistic: In belief scene graphs, nodes maintain spatial distributions (soft location heatmaps) representing uncertainty about unobserved object placement (Saucedo et al., 5 May 2025).

Hierarchical structures are often maintained, with nodes for buildings, floors, rooms, objects, and functional elements (e.g., control panels), supporting multi-scale reasoning (Werby et al., 1 Oct 2025, Agia et al., 2022).

For graphics and XR, geometric algebra provides a common operator algebra for points, transforms, and higher-grade geometric entities, encapsulating rotations, translations, dilation, and constraints in the multivector representation of scene-graph nodes and edges (Kamarianakis et al., 19 Nov 2025, Kamarianakis et al., 2023).

4. Symbolic and Commonsense Reasoning Mechanisms

The symbolic graph component can be built from hand-crafted knowledge graphs, extracted ontologies, or learned relation graphs:

  • Symbolic knowledge graphs K=(Vk,Ek)K=(V_k,E_k) hold object/scene concepts and relation edges (“part_of”, “affordance”, “co-occurrence”) endowed with weights or compatibility functions, supporting domain-specific composition (Aryan et al., 2024, Saucedo et al., 5 May 2025).
  • Room-object spatial ontologies are induced via LLM queries (e.g., “Is a stove found in a kitchen?”), with symbolic bipartite graphs between rooms and object types, enforcing coarse commonsense priors in geometric learning (Saucedo et al., 5 May 2025).
  • Explicit logic forms, Datalog rules, or temporal logic (LTLn_n) formulae are extracted from text/caption and used to impose high-level integrity or temporal constraints (e.g., “always(left_of(v1v_1, v2v_2)) ⇒ x_1 < x_2”) (Huang et al., 2023, Weng et al., 26 Sep 2025).
  • Symbol mapping functions can deterministically or probabilistically map metric data to high-level predicates, producing synthesized symbolic scene graphs from geometric observations (Zhu et al., 2020).

Message passing, symbolic inference, or contrastive-alignment is used to integrate these constraints into the GNN-based learning process (Aryan et al., 2024, Saucedo et al., 5 May 2025, Huang et al., 2023).

5. Applications and Empirical Validation

Geometric and symbolic scene graphs underpin various tasks:

  • Scene understanding: Compound-scene classification via merged scene-knowledge graph search (e.g., kitchen recognition) with interpretable outputs (Aryan et al., 2024).
  • Generative scene synthesis: Probabilistic priors over furniture placement and orientation (SceneGen), diffusion models for 3D geometry synthesis from text (GeoSceneGraph), expressive image generation from full symbolic-geometric descriptors (“Image Generation from Scene Graphs”) (Keshavarzi et al., 2020, Ruiz et al., 18 Nov 2025, Johnson et al., 2018).
  • Robot planning and task/motion synthesis: Efficient task-and-motion planning over large 3D scene graphs, exploiting hierarchy and minimal subgraph extraction (SCRUB, SEEK) (Agia et al., 2022, Zhu et al., 2020). Hierarchical planners compose symbolic reasoning (subgoal regression) and geometric motion inference for robust long-horizon manipulation (Zhu et al., 2020).
  • Navigation and mapping: Real-time, incremental, open-set semantic mapping directly on the 3D scene graph backbone, integrating raw sensor streams (RGB-D, masks, CLIP descriptors) with symbolic/topological structure. Supports open-world queries, export to knowledge graphs, and fast spatial reasoning (Günther et al., 3 Feb 2026, Seymour et al., 2022).
  • Spatial commonsense: Estimating spatial distributions of unseen or missing objects, generating full room layouts from sparse observations, and integrating high-level priors for scene composition in both simulation and field robotics (Saucedo et al., 5 May 2025).
  • Geometric problem solving and diagrammatic inference: Geometry education and assistant agents dynamically manipulate a scene graph with precise construction and transformation steps, using explicit symbolic and geometric updates, with RL-driven robustness (Weng et al., 26 Sep 2025).

Empirical evaluations employ metrics for classification accuracy, geometric layout distance (e.g., Wasserstein, Frobenius), stepwise reasoning correctness, planning efficiency, navigation success rate, and user plausibility ratings (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020, Keshavarzi et al., 2020).

6. Strengths, Limitations, and Future Directions

Integrated geometric and symbolic scene graphs offer structured, scalable, and interpretable frameworks that bridge sub-symbolic sensor data with high-level reasoning. Key strengths include transparent intermediate activations, joint learning of spatial and symbolic structure, and the ability to ground queries, generative synthesis, and planning within one unified data structure (Aryan et al., 2024, Saucedo et al., 5 May 2025, Günther et al., 3 Feb 2026).

However, several open challenges remain:

  • Quality and coverage of object detection and relation prediction can bottleneck downstream reasoning, as errors propagate through the unified graph (Aryan et al., 2024).
  • Many systems require curated, domain-specific knowledge graphs or annotated spatial ontologies; brittleness to missing or inconsistent knowledge has been documented (Aryan et al., 2024, Saucedo et al., 5 May 2025).
  • Scalability to hundreds of classes, dense open-world scenes, or multi-agent, long-horizon tasks introduces architectural and computational constraints, especially for graph storage and neural message-passing (Aryan et al., 2024, Saucedo et al., 5 May 2025, Zhu et al., 2020).
  • End-to-end learning of symbolic predicates, open-vocabulary relation extraction, and zero-shot generalization to novel object types or relation schemas remain active areas of research (Huang et al., 2023, Günther et al., 3 Feb 2026).

A plausible implication is that continued advances in scalable geometric algebra frameworks, differentiable logic engines, and neurally-initialized symbolic graphs will further consolidate this paradigm, moving towards generalizable, interpretable, and highly capable scene understanding and generation systems.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geometric and Symbolic Scene Graphs.