Minimal Information Neuro-Symbolic Tree (MINT)
- MINT is a neuro-symbolic framework that combines symbolic reasoning, uncertainty-aware neural planning, and LLM query curation to address knowledge gaps in open-world, object-driven planning.
- It builds a binary decision tree from minimal human queries to refine object properties, with formal performance guarantees and decreasing regret bounds.
- Empirical results demonstrate that MINT achieves near-expert performance with only 1–3 queries per episode, outperforming traditional RL and pure LLM approaches.
The Minimal Information Neuro-Symbolic Tree (MINT) framework is a methodology for knowledge-gap reasoning and active human elicitation in open-world, object-driven planning. MINT jointly integrates symbolic reasoning, neural uncertainty-aware planning, and LLM query curation to optimize information-seeking strategies under conditions of incomplete knowledge, with formal guarantees on performance and empirical efficiency across diverse planning domains (Fang et al., 4 Feb 2026).
1. Formal Problem Setting
MINT is formulated for object-driven planning scenarios where an AI agent engages with environments whose transition dynamics or reward structures depend on latent object properties and/or human intent that are initially unknown. The totality of these unknowns is parameterized by a “knowledge gap” variable .
Given a state space and action space , a fully known environment with descriptor is defined as an MDP: where and are fully determined. With partial knowledge, is constrained to a set , yielding an extended MDP family: The agent's objective is to minimize regret in expected return by selectively querying the human with binary (yes/no) propositions, reducing the knowledge gap to a smaller set . The regret of a policy under the true latent is defined as: where is the value under the optimal policy when is known. The MINT agent seeks to minimize this regret with the minimal number of queries (Fang et al., 4 Feb 2026).
2. Architecture and Algorithmic Workflow
The MINT framework constructs a symbolic decision tree where:
- Nodes correspond to specific knowledge gaps (, represented by sets ).
- Edges represent binary questions and the corresponding yes/no human answers , leading to refined gaps.
At each node , MINT’s neural planning policy (typically an uncertainty-aware DQN, or UA-DQN) produces: The variance quantifies outcome-uncertainty.
Node expansion proceeds by computing the margin
If for a tunable and depth limit is not reached, the knowledge gap is split along a dimension (type, subtype, or numerical interval), recursively building two child nodes for each possible binary response.
Upon tree completion, an LLM is employed to:
- Merge subtrees sharing the same optimal action.
- Refactor subtrees to logical disjunctions.
- Synthesize the query maximizing information gain: where is the entropy over optimal action choices.
The agent then queries the human, prunes inconsistent branches based on the answer, and repeats until a leaf is reached. The final output is the action maximizing at the surviving leaf.
3. Theoretical Guarantees
MINT provides formal bounds on policy return as knowledge gaps are reduced. A central construct is the pseudo-metric between two MDPs: which is symmetrized as
This yields a local pseudo-Lipschitz property for the optimal Q-function: After a binary split where divides into two gaps with representative MDPs , : where maximizes the metric. Recursively, the residual regret decays proportionally to the final diameter of the knowledge gap in the pseudo-metric (Fang et al., 4 Feb 2026).
4. Interaction with LLMs
Upon completion of the symbolic tree via neural self-play and uncertainty evaluation, MINT transfers the structure to an LLM for:
- Summarization and subtree merging where optimal actions coincide.
- Query synthesis: generating human-interpretable yes/no questions that maximize information gain with respect to action choice.
- Internal update and recursion as new binary answers are received.
LLMs leverage their capacity for logical reasoning and symbolic manipulation to express optimal query strategies succinctly, maintaining the formal guarantees established in the upstream symbolic and neural components.
5. Empirical Assessment
MINT was empirically evaluated on three benchmarks of increasing complexity:
| Domain | Baseline(s) / Method | Success/Return | Avg. Queries/Episode |
|---|---|---|---|
| MiniGrid (discrete maze) | PPO (RL) | 70–90% success, 5–8 reward | — |
| GPT-4 (LLM) | 83–100% success, 6–9 reward | — | |
| Query-A (high-variance q's) | 65–83% success, 4–7 reward | ~27 | |
| MINT (≤ 3 queries) | 100% success, ~9.5 reward | 2–3 | |
| Atari Pac-Man | PPO | ~325 return | — |
| Query-A | ~422 return | ~27 | |
| MINT (≤ 3 queries) | ~412 return | 3.8 | |
| MINT (unlimited queries) | ~435 return | 6.8 | |
| Isaac Search & Rescue (3D) | LLM-only | 30–60% main, <10% hidden | — |
| MINT+UA-DQN+LLM | 95–99% both targets | — |
Across tasks, MINT achieves near-expert returns while requiring only 1–3 queries per episode, in stark contrast to naive baselines that use an order of magnitude more queries. Pure LLM or RL approaches exhibit lower return/success, especially on environments with significant knowledge gaps (Fang et al., 4 Feb 2026).
6. Limitations and Future Research
MINT displays several strengths: systematic integration of symbolic and neural knowledge-gap reasoning, active query optimization by self-play, formal regret guarantees, and significant query-efficiency in difficult domains.
Identified limitations include:
- Restriction to binary (yes/no) queries; extension to multi-answer or free-form elicitation is not yet explored.
- Hand-crafted splitting heuristics for partitioning (type, subtype, value); potential exists for learning differentiable or data-driven splits.
- Assumption of a high-quality neural planner (UA-DQN); substitutability with other uncertainty-aware planners is a prospective path.
- Absence of adaptation to continuous-valued queries or multi-agent scenarios.
This suggests ongoing research could enhance the expressiveness and flexibility of MINT’s querying mechanisms and its applicability to more complex, open-ended planning domains with richer forms of human-AI interaction (Fang et al., 4 Feb 2026).