ExAI Engine: Explainable Dialogue AI

Updated 27 February 2026

ExAI Engine is a neuro-symbolic, collaborative AI architecture designed for explainable, multi-agent task-oriented dialogue using formal belief-desire-intention reasoning.
It leverages modal logic-based representations and a lightweight Horn-clause meta-interpreter to dynamically manage agents' beliefs, goals, and intentions.
The system unifies speech acts with physical actions through mixed-initiative planning and theory of mind, ensuring transparent, real-time collaboration.

The ExAI Engine is a neuro-symbolic, collaborative AI architecture designed to enable explainable, multi-agent task-oriented dialogue. Centered within the Eva dialogue system, the ExAI Engine formalizes and operationalizes collaborative belief-desire-intention (BDI) reasoning, persistent goal management, theory of mind (ToM), and plan-based justification. It leverages modal logic-based representations with a lightweight, Prolog-style Horn-clause meta-interpreter to support real-time, multimodal interaction and explainability. The system encodes explicit representations of each agent’s mental states—beliefs, goals, intentions—and dynamically infers, plans, and explains both physical and speech acts in shared task scenarios (Cohen et al., 2023).

1. Logical Foundations and Knowledge Representation

The ExAI Engine grounds its reasoning in modal logic, introducing operators representing beliefs, knowledge, persistent goals, and intentions. The central operators include:

$\mathrm{bel}(X, \varphi)$ : Agent $X$ believes formula $\varphi$ .
$\mathrm{knowif}(X, \varphi) \equiv \mathrm{bel}(X, \varphi) \vee \mathrm{bel}(X, \neg\varphi)$ : Agent $X$ knows whether $\varphi$ .
$\mathrm{knowref}(X, \text{Var}^{\text{Pred}}) \equiv \exists d. \mathrm{bel}(X, \text{Pred}[\text{Var}\mapsto d])$ : Knows the referent for a description.
$\mathrm{pgoal}(X, \varphi, R)$ : $X$ adopts a persistent goal for $\varphi$ , relativized to $R$ .
$\mathrm{intend}(X, \mathrm{Act}, R) \equiv \mathrm{pgoal}(X, \mathrm{done}(X, \mathrm{Act}), R)$ : $X$ intends to perform act $\mathrm{Act}$ until $R$ fails.

Reasoning is executed through meta-interpreters that can both prove modal formulas ( $\mathrm{istrue/1}$ ) and rewrite or assert modal formulas ( $>/$ 1), with typical inference rules such as:

$\mathrm{istrue}(\mathrm{bel}(X,\varphi \wedge \psi)) \leftarrow \mathrm{istrue}(\mathrm{bel}(X,\varphi)), \mathrm{istrue}(\mathrm{bel}(X,\psi))$
$> \mathrm{bel}(X,\varphi \wedge \psi) \Rightarrow \mathrm{bel}(X,\varphi), \mathrm{bel}(X,\psi)$

These capabilities support explicit tracking and updating of multiple agents’ mental states, a prerequisite for collaborative ToM.

2. Planning and Intention Mechanisms

The planning module integrates both backward-chaining on effects (STRIPS-style) and hierarchical decomposition (HTN-style). Key mechanisms include:

Backward-Chaining Rules: Persistent goals are achieved by finding actions whose effects unify with the intended goal. Preconditions are recursively instantiated as sub-goals; blocked intentions are created whenever applicability conditions are not determinable.
Hierarchical Decomposition: Compound actions (conditionals, disjunctions, sequences) are decomposed into lower-level intentions, with each step suitable for further planning or expansion.
Persistent Goals and Relativizing: Each pgoal or intention contains a "relativizer" argument (e.g., $R$ ), ensuring the persistence and controlled retraction of goals aligns with upstream dependencies or failures.
Revision and Suspension: Intentions are suspended if key conditions are unknown (knowif subgoal) and are abandoned upon goal achievement, failed applicability, or breakdown of relativizing context.

This formalism enables robust, collaborative plan repair, continual intention revision, and adaptive task execution in dynamic settings (Cohen et al., 2023).

3. Speech Acts as First-Class Planning Operators

Speech acts are modeled uniformly with physical actions in the planning and reasoning substrate. Each speech act—inform, assert, question (wh-/yn-), request, verify, and assertref—has formal preconditions and postconditions. For example:

$\mathrm{inform}(S, H, P)$ : $\mathrm{precond}: \mathrm{bel}(S, P)$ , $\mathrm{effect}: \mathrm{bel}(H, P)$
$\mathrm{request}(S, H, \mathrm{Act})$ : $\mathrm{precond}: \mathrm{bel}(S, \mathrm{precond}(\mathrm{Act}))$ , $\mathrm{effect}: \mathrm{intend}(H, \mathrm{Act}, \dots)$

Backward-chaining during planning selects speech acts whose effects unify with knowledge or intention-related subgoals, yielding nested plans that systematically integrate both domain and communicative actions. This unification enables flexible, dynamic dialogue generation, repair, and turn selection in mixed-initiative, multi-agent contexts.

4. Theory of Mind and Multi-Party Collaboration

The ExAI Engine encodes a multi-agent representation of mental states: for every agent $A$ , the system maintains $\mathrm{bel}(A, \cdot)$ , $\mathrm{pgoal}(A, \cdot)$ , and $\mathrm{intend}(A, \cdot)$ . Plan recognition is performed by observing utterances and inferring their corresponding underlying goals and intentions, using reverse application of the planning rules. The system tracks referential identity across pgoals, supporting consistency in joint belief and intention attribution.

Upon encountering obstacles (e.g., a user's plan blocked by unknown or false applicability conditions), the planner attempts collaborative repairs by proposing sub-goals or alternative plans. This mechanism affords transparent, mixed-initiative dialogue architectures with real-time adaptation to evolving user needs and dialogue contingencies.

5. Explainability and Plan-Based Justification

Every act, including all speech and physical actions, is grounded in a plan node within the agent’s intention structure. When queried for justification (e.g., "Why did you say X?"), the ExAI Engine traces the executed action to its intention node, follows the relativization structure to the lowest-level pgoal or intention not yet shared by the user, and verbalizes the explicit reason. For example, in a dialogue exchange where the system asks the user's age, the lowest unshared pgoal might be $\mathrm{pgoal}(\text{sys}, \mathrm{knowif}(\text{sys}, \mathrm{eligible}(\text{user})))$ , yielding explanations such as “I asked because I need to determine whether you are eligible for the Covid vaccine.”

This architecture guarantees that every utterance and decision is retrospectively explainable, as each action is indexed to a specific node in a transparent, collaborative plan (Cohen et al., 2023).

6. System Architecture and Implementation

The ExAI Engine’s system implementation comprises the following modules:

ASR + Vision: Multimodal input including speech, gesture, and emotion.
Semantic Parser: XLM-RoBERTa fine-tuned to produce logical forms.
Plan Recognition / Model Updater: Updates bel/pgoal/intend for all agents in shared dialogue context.
Collaborative Planner / Obstacle Detector: Generates and repairs plans in response to inferred user aims.
Intention Scheduler: Selects the next action (speech or domain) based on current plan.
NLG / Multimodal Generator + Avatar: Realizes system outputs across modalities (speech, text, gestures).
Execution Monitor + Backend Connectors: Supports grounded action via databases and external APIs.

The system supports speech, text, gesture, avatar animation, and multi-modal sensory feedback, operating in real-time with no offline batch planning. The meta-interpreter and planner are optimized for responsiveness and correctness, enabling real-time, explainable collaborative dialogue. No detailed quantitative metrics are reported. Performance is attributed to fast Prolog meta-interpretation and heuristic planning.

7. Synthesis and Significance

By integrating (1) a Horn-clause meta-interpreter for modal belief/knowledge reasoning, (2) formally specified persistent goals and intentions, (3) a hybrid HTN/STRIPS-style collaborative planner, (4) speech acts as primitive planning operators, and (5) multi-agent ToM, the ExAI Engine delivers a unique combination of explainability, collaborative assistance, mixed-initiative flexibility, and real-time interaction in multi-party, task-oriented dialogue. Each system decision is transparent and revisable, allowing dynamic inference of user plans, obstacle detection and repair, and grounded, retrospective explanation for every speech or non-speech act (Cohen et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

An Explainable Collaborative Dialogue System using a Theory of Mind (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ExAI Engine.