ConceptBot: Modular Robotic Planning

Updated 6 September 2025

ConceptBot is a modular robotic planning framework that integrates LLM reasoning with commonsense knowledge graphs to decompose ambiguous user instructions and enrich object semantics.
It employs a three-module architecture—object property extraction, user request processing, and planning—to generate risk-aware pick-and-place policies using cosine similarity and affordance scoring.
Empirical evaluations on benchmarks like SafeAgentBench show ConceptBot achieves up to 100% success on explicit tasks and outperforms systems like Google SayCan in ambiguous and safety-critical scenarios.

A modular robotic planning framework, ConceptBot integrates LLMs with external commonsense knowledge graphs for robust task-decomposition, semantic grounding, and risk-aware policy generation in robotic manipulation. Its architecture is designed to resolve ambiguities in natural language instructions, analyze and enrich object properties with semantic concepts, and synthesize environment-appropriate pick-and-place behaviors—without requiring domain-specific training. Empirical evaluations on benchmarks such as SafeAgentBench and multiple laboratory scenarios demonstrate ConceptBot’s ability to generalize, accurately resolve material and safety tasks, and outperform previous LLM-augmented robotic systems, notably Google SayCan (Leanza et al., 30 Aug 2025).

1. Architecture and System Overview

ConceptBot’s architecture consists of three tightly coupled modules: Object Property Extraction (OPE), User Request Processing (URP), and a Planner. The system incorporates retrieval-augmented generation (RAG) and caching to efficiently combine LLM reasoning with real-time knowledge retrieval.

OPE Module: Given scene objects detected by a vision system (e.g., ViLD/YOLO), OPE retrieves related semantic relationships from ConceptNet for each object, embedding retrieved triples with OpenAI’s text-embedding-ada-002 model. Object properties are assigned by computing cosine similarities between the embedded relationships and a set of target semantic/conceptual attributes (e.g., “fragile,” “dangerous,” “toxic”), using a threshold filter (θ=0.75).
URP Module: The URP pipeline processes user-provided instructions to extract intent and disambiguate tasks. It first keywords instructions (by spaCy or an LLM call), fetches associated relations from ConceptNet (and, if relevant, detected scene objects), and fuses them with context data (robot capabilities, object properties) and few-shot exemplars to produce an enriched prompt. Using Chain of Thought (CoT) prompting, the LLM generates a reasoning trace (R_urp) and an actionable, structured robot policy (A_urp).
Planner Module: Receives A_urp and the enriched object property dictionary (P_obj). The LLM, prompted with the structured context and properties, proposes n (e.g., n=5) candidate actions with temperature set to zero for deterministic output. Each action is scored both by its LLM completion frequency (S_LLM) and an independently computed affordance score (S_affordance) comprising a detection confidence, bounding box suitability (S_bbox), and property-based violation penalty (S_prop). The final policy selection is: $S_{comb} = S_{LLM} \cdot S_{aff}$ with the top-scoring action executed.

The architecture’s modularity ensures extensible compositionality and integration into broader robotic frameworks, with knowledge graph integration and LLM calls augmented by a cache to minimize latency and external API costs.

2. Semantic Enrichment and Commonsense Reasoning

ConceptBot’s key innovation lies in on-the-fly semantic enrichment of both visual detections and user instructions:

Object Semantic Augmentation: For every detected object, the OPE module queries ConceptNet for all directly related attribute triples. Each relation, after embedding and threshold filtering, expands the robot’s world model with risk- and affordance-relevant metadata. For instance, an object “garden bean” is augmented by semantic links (e.g., “is RelatedTo toxic”) flagged by cosine similarity, supporting nuanced risk-aware utility.
Disambiguation via Knowledge Graphs: Instructions with ambiguous referents (e.g., “Put the glass away”) are resolved by cross-referencing ConceptNet properties with the current scene. If ConceptNet lacks sufficient information, a fallback routine queries Wikipedia and applies Open Information Extraction (OpenIE), supplementing structured knowledge.
Concept Integration into Planning: Relations from OPE and URP, filtered for physical and conceptual compatibility, are provided as part of the system context to the LLM, ensuring candidate plans reflect both linguistic intent and domain-grounded semantic constraints.

This design enables rapid adaption to new domains or previously unseen classes of objects or instructions, without explicit retraining or manual rule engineering.

3. Planning, Scoring, and Policy Generation

ConceptBot’s planning proceeds via a two-phase scoring regime:

LLM-based Completion Scoring (S_LLM): For the set of candidate actions derived via few-shot or chain-of-thought prompted LLM calls, a frequency-based score is computed, i.e., $S_{LLM}(a_k) = \frac{\text{completions with } a_k}{n}$ for each candidate $a_k$ .
Affordance-based Scoring (S_aff): Each candidate undergoes feasibility assessment:
- RPN-based confidence from the object detector (both for pick and place locations).
- Bounding box compatibility for physical interaction (S_bbox).
- Property-based penalties if, e.g., the action involves picking an object marked “dangerous” or placing “toxic” objects in unsafe locations (S_prop).
Combined Scoring and Execution: The final selection multiplies these scores:

$S_{comb}(a_k) = S_{LLM}(a_k) \cdot S_{aff}(a_k)$

The highest-scoring action is executed by the robot. Algorithms are detailed in Appendix Algorithm 1 of the original work (Leanza et al., 30 Aug 2025).

This dual-pronged evaluation (semantic and feasibility-based) ensures both high-level reasoning soundness and low-level policy pragmatism.

4. Empirical Evaluation and Comparative Performance

For explicit, implicit, and risk-aware pick-and-place benchmarks, ConceptBot was empirically compared with Google SayCan and evaluated on SafeAgentBench:

Explicit Tasks: 100% success (SayCan: 84%)
Implicit/Ambiguous Tasks: 87% accuracy (SayCan: 31%)
Risk-aware Tasks: 76% (SayCan: 15%)
SafeAgentBench Aggregate: 80% (next-best baseline: 46%)
Material Classification: 70% (SayCan: 20%)
Toxicity Detection: 86% (SayCan: 36%)

These results reflect robustness across a spectrum of task types, including ambiguous instructions and safety-critical reasoning. The approach generalizes without domain-specific training, relying on LLMs and dynamically integrated ConceptNet relations.

5. Scenario Types and Adaptability

ConceptBot is validated in application-specific scenarios such as:

Material classification: Correctly resolving objects with mixed or ambiguous materials (e.g., “cheese paper” RelatedTo paper and wax) by integrating relations from external KGs.
Toxicity detection: Leveraging semantic links (e.g., “jack bean” is RelatedTo toxic) to flag and properly handle hazardous items.
Ambiguity resolution: Handling user commands that lack explicit referents by fusing instruction context, concept retrieval, and scene information.
Generalization: Experiments conducted in both simulation and lab environments (UR5e and Franka Emika Panda) demonstrate reliable performance in real-world, unstructured scenes.

The system’s retrieval/caching design supports rapid integration of new knowledge sources, allowing incorporation of properties or relations not present at training or deployment initialization.

6. Technical Details and Underlying Algorithms

Context Fusion: At every planning step, the system context $M_{sys}$ provided to the LLM concatenates robot capabilities ( $C_{rob}$ ), object property dictionary ( $P_{obj}$ ), relationships filtered from keywords/objects ( $R_{k,fil}$ and $R_{o,fil}$ ), and few-shot exemplars ( $E$ ):

$M_{sys} = C_{rob} + P_{obj} + R_{k,fil} + R_{o,fil} + E$

Relation Filtering: For relation embedding $v_r$ and property embedding $v_{prop}$ :

$\text{similarity}(v_r, v_{prop}) > \theta$

(commonly $\theta = 0.75$ )

Policy Scoring: See above for LLM and affordance score definitions.
Cache-Augmented Generation: Both ConceptNet query responses and embedding computations are cached, minimizing redundant requests and API costs.

Pseudocode and additional algorithmic details are provided in the source.

7. Research Impact and Future Directions

ConceptBot demonstrates that fusing powerful generative LLMs with structured, retrieval-based commonsense knowledge (e.g., ConceptNet) enables substantial progress in semantic task-decomposition and robust planning for physical agents—without the need for custom training or painstaking manual rule-crafting. Its results on SafeAgentBench and laboratory scenarios reflect a new milestone in language-vision-robotics integration.

A plausible implication is that further generalization could be realized by extending the knowledge graph interface to support richer ontologies (beyond ConceptNet), and by researching scaling strategies for system context construction as environments or task sets expand. Technical improvements may focus on latency reduction in knowledge retrieval, handling of more complex chained-predicate reasoning, and adaptation to manipulation tasks beyond pick-and-place.

ConceptBot provides a transferable methodology for integrating structured semantic augmentation with LLM-driven policy generation, establishing new performance baselines for context-aware, risk-aware, and highly generalizable robotic agents (Leanza et al., 30 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

ConceptBot: Enhancing Robot's Autonomy through Task Decomposition with Large Language Models and Knowledge Graph (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to ConceptBot.