Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hierarchical Neuro-Symbolic Decision Transformer

Updated 1 July 2026
  • Hierarchical Neuro-Symbolic Decision Transformer is a framework that combines high-level symbolic planning with a goal-conditioned Decision Transformer for effective sequential decision-making.
  • It uses a two-level hierarchy to decompose complex tasks into interpretable subgoals and refines them into executable actions, balancing planning with data-driven control.
  • Empirical evaluations in grid-world and multi-robot domains demonstrate significant gains in task success, reduced trajectory length, and improved sample efficiency.

The Hierarchical Neuro-Symbolic Decision Transformer (HNSDT) denotes a family of control frameworks that integrate classical symbolic planning with transformer-based policy learning to achieve robust, interpretable, and efficient long-horizon sequential decision-making. These architectures are motivated by the limitations of both data-intensive reinforcement learning and rigid model-based planners in domains characterized by combinatorial structure, stochastic dynamics, and extended temporal dependencies. The defining principle is a two-level hierarchy in which a neuro-symbolic (logic-based) module decomposes the task into interpretable subgoals, each subsequently refined into executable action sequences by a goal-conditioned Decision Transformer (DT).

1. Problem Formulation

The HNSDT framework assumes an underlying Markov Decision Process (MDP) with state space SS, action space AA, transition kernel P(s′∣s,a)P(s'|s,a), and reward function R(s,a)R(s,a). Challenges arise in long-horizon settings, where compounding uncertainties stress the limitations of both end-to-end neural policies (which struggle to sequence sub-tasks) and classical planners (which lack adaptability in continuous, uncertain environments) (Baheri et al., 10 Mar 2025, Rasanji et al., 19 Aug 2025).

The neuro-symbolic approach divides control into high-level symbolic planning—over discrete, logic-encoded subgoals—and low-level policy learning—from offline execution data—enabling the system to handle both combinatorial complexity and the distributional shift typical in multi-robot or stochastic domains.

2. Hierarchical Architecture and Data Flow

The canonical architecture employs a two-level hierarchy (Rasanji et al., 19 Aug 2025):

  • High-Level Neuro-Symbolic Planner: Accepts a symbolic domain specification (typically in PDDL), a problem instance, and possible contextual information. Using tools such as BFS, A*, or LLMs (e.g., LLaMA3), it returns an ordered list of symbolic subgoals (operators), optimized for plan length or cost.
  • Low-Level Goal-Conditioned Decision Transformer (GCDT): Receives as input the current numerical state sts_t, a return-to-go R^t=∑j=tTrj\hat R_t = \sum_{j=t}^T r_j, and the current subgoal gtg_t. It outputs the next continuous or discrete action ata_t by modeling sequential dependencies over past states, actions, and subgoals using a decoder-only transformer.

The planning and execution loop can be summarized as follows: R(s,a)R(s,a)4 This organization enables structured, human-interpretable high-level logic alongside expressive, sample-efficient low-level control.

3. Neuro-Symbolic Planning Layer

The high-level planner operates over a symbolic domain D=⟨P,O⟩D = \langle P, O \rangle, where PP is a finite set of environment predicates and AA0 a finite set of symbolic operators or actions. Each operator AA1 is parameterized by its preconditions AA2, effects AA3, and cost AA4. A task is formulated by abstracting the initial numeric state to a symbolic state, then searching for a plan AA5 that minimizes AA6 while achieving the symbolic goal AA7 (Rasanji et al., 19 Aug 2025).

At inference time, each symbolic subgoal AA8 is mapped via encoder AA9 to numerical subgoal P(s′∣s,a)P(s'|s,a)0 suitable for use by the GCDT. Subgoal transitions are determined by monitoring the subgoal-complete flag in the continuous state space.

4. Goal-Conditioned Decision Transformer

The low-level policy is parameterized as a goal-conditioned Decision Transformer (GCDT). Its inputs, per timestep, are triplets P(s′∣s,a)P(s'|s,a)1:

  • P(s′∣s,a)P(s'|s,a)2: current continuous state vector (e.g., 3D positions, subgoal-completion flag)
  • P(s′∣s,a)P(s'|s,a)3: dense embedding of the current subgoal
  • P(s′∣s,a)P(s'|s,a)4: return-to-go estimator for reward shaping

The transformer architecture comprises linear token embeddings, positional encodings, P(s′∣s,a)P(s'|s,a)5 blocks of multi-head causal self-attention and feed-forward networks, followed by a decoder head producing action logits or continuous actions. The training objective is either the next-action mean-squared error (continuous case) or negative log-likelihood (discrete case), e.g.,

P(s′∣s,a)P(s'|s,a)6

or

P(s′∣s,a)P(s'|s,a)7

(Rasanji et al., 19 Aug 2025, Baheri et al., 10 Mar 2025).

5. Bidirectional Interface and Error Analysis

A crucial feature is the bidirectional interface between planning and execution. The planner yields a sequence of subgoals, each encoded as a token P(s′∣s,a)P(s'|s,a)8; the DT attempts to reach each subgoal, switching to the next upon completion or triggering re-planning if symbolic state divergence is detected. This loop permits principled error tracking: symbolic suboptimality and execution error combine according to composite and concentration bounds. Letting P(s′∣s,a)P(s'|s,a)9 denote the optimal value function, the hierarchy yields

R(s,a)R(s,a)0

where R(s,a)R(s,a)1 is the planner’s suboptimality gap, R(s,a)R(s,a)2 the per-operator cost approximation, and R(s,a)R(s,a)3 the upper bound on per-operator execution error (Baheri et al., 10 Mar 2025). Probabilistic concentration bounds further constrain error propagation over long horizons.

6. Empirical Evaluation

Empirical studies span stochastic grid-worlds (Baheri et al., 10 Mar 2025) and multi-robot tabletop manipulation domains (Rasanji et al., 19 Aug 2025). The symbolic-hybrid approach consistently surpasses end-to-end or purely symbolic baselines across metrics such as task success rate, trajectory length, and sample complexity.

Key experimental results:

Experiment Hybrid Success (%) Pure DT Success (%)
Key-Door (fail_prob=0.1) 98 70
Key-Door (fail_prob=0.3) 82 30
Multi-Goal Grid 60–40 ≈0

In multi-robot settings (Rasanji et al., 19 Aug 2025):

  • Plan-generation accuracy for the symbolic planner reaches 100% in sandwich assembly and 90% in grocery-packing, with minimal replanning.
  • Subgoal conditioning increases task success from 28–43% (no subgoals) to 81–95% (with subgoals).
  • Zero-shot transfer achieves 72.35% success on unseen 8-item tasks.
  • Few-shot adaptation crosses 90% success with 100 samples in cross-task fine-tuning.

7. Interpretability, Generalization, and Extensions

The use of symbolic planning ensures that the generated plans remain human-interpretable and allow for direct operator intervention. Subgoal anchoring supports zero-shot transfer and lowers the memory burden on the transformer, enhancing generalizability. The modular, hierarchical organization enables safety enforcement (e.g., via external constraint checkers) and facilitates integration with legacy robotics infrastructure (Rasanji et al., 19 Aug 2025).

Limitations include:

  • Requirement for a manually specified symbolic abstraction and operator set.
  • Sensitivity to divergence between symbolic and numeric states, necessitating costly replanning under dynamics or partial observability.
  • Independence assumptions in error analysis.

Potential extensions include automatic predicate discovery via learned encoders, adaptation to POMDPs by planning over belief states, and tighter integration of execution-cost-aware planning (Baheri et al., 10 Mar 2025).

A plausible implication is that these architectures define a scalable paradigm for interpretable, sample-efficient control in complex, real-world multi-agent systems, particularly where both combinatorial logic and continuous adaptation are required.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Neuro-Symbolic Decision Transformer.