NeuroWise: LLM Double Empathy System

Updated 18 March 2026

NeuroWise is a multi-agent LLM system that simulates autistic communication to foster double empathy by exposing internal stress and offering adaptive coaching.
Its architecture comprises four LLM-driven agents that simulate partner behavior, estimate stress levels, interpret internal states, and coach conversational responses using a transparent glass-box framework.
Empirical evaluations show NeuroWise reduces deficit-based attributions and conversation turns, thereby enhancing communication efficiency and mutual understanding.

NeuroWise is a web-based, multi-agent LLM system designed to enable neurotypical users to practice “double-empathy” communication with a simulated autistic partner. Unlike deficit-oriented approaches focusing on autistic individuals, NeuroWise operationalizes the double empathy framework by supporting neurotypical users with stress visualization, model-based interpretations of internal experiences, and contextually adaptive coaching, all using a transparent ("glass-box") architecture (Tang et al., 21 Feb 2026).

1. Theoretical Framework and Rationale

NeuroWise is motivated by the double empathy problem, which posits that communicative disconnects between neurodivergent (especially autistic) and neurotypical individuals originate from reciprocal misunderstandings, not one-sided deficiencies. Conventional interventions tend to pathologize autistic cognition, reinforcing deficit-based attributions. NeuroWise foregrounds mutual sense-making by simulating neurodivergent communication and revealing the internal state, perspective, and stress levels of the autistic conversational partner. This approach aims to support neurotypical users in interpreting behavioral signals as contextually understandable rather than indicative of personal deficits (Tang et al., 21 Feb 2026).

2. System Architecture and “Glass-Box” Approach

The NeuroWise system orchestrates four discrete LLM-driven agents in a turn-based pipeline, denoted as A = {A₁, A₂, A₃, A₄}:

A₁ (“Partner”) simulates the autistic partner (“Alex”) and produces verbal responses conditioned on dialogue history and current stress.
A₂ (“Stress Estimator”) classifies each user message $u_t$ into a communication category $c_t \in C$ with $C = \{validation, invalidation, pressure, options-giving, sensory-accommodation\}$ , then assigns a stress increment $\Delta_t$ based on a lookup $\delta : C \rightarrow \mathbb{R}$ (e.g., $\delta(\text{invalidation}) = +15$ , $\delta(\text{validation}) = -10$ ).
A₃ (“Interpreter”) generates natural-language explanations $e_t$ for Alex’s internal experience after stress increases, supporting perspective-taking.
A₄ (“Coach”) issues concrete, context-sensitive guidance $r_t$ to scaffold the user’s next move, emphasizing neurodiversity-affirming strategies.

State updates follow

$S_t = \operatorname{clip}(S_{t-1} + \Delta_t, 0, 100)$

with $S_0 = 0$ , where $\operatorname{clip}$ restricts the value to $[0, 100]$ .

All agent outputs are persistently displayed in a “Support Panel,” comprising:

A real-time Stress Bar showing $S_t$
Interpreter explanations $e_t$
Coach suggestions $r_t$

This glass-box design exposes the inferential process of the system, offering traceability of internal state transitions and recommendations (Tang et al., 21 Feb 2026).

3. Interaction Techniques and Classification Pipeline

Interaction is mediated by an LLM-based pipeline in which the user’s utterances are classified using GPT-4o-mini. Stress estimation is grounded in the categorization result and rule-based mapping. The Interpreter and Coach are triggered whenever the stress increment $\Delta_t$ exceeds a threshold $\tau$ (here, $\tau = 0$ ), prompting the LLM to hypothesize about Alex’s cognitive/emotional state and recommend validation and sensory-accommodative strategies.

Empirical validation showed:

Inter-rater reliability on 15 scripted dialogues (63 turns): ICC = 0.86, 95% CI [0.77, 0.91]
Strong LLM–human classification correlation: $r = 0.86$ , $p < .001$
High discriminant validity between stress scenarios: Cohen’s $d = 9.33$

This suggests that the stress estimator’s classification aligns robustly with human judgments and that the designed interaction scaffolds are reliably triggered under salient communication breakdowns (Tang et al., 21 Feb 2026).

4. Experimental Design and Evaluation Metrics

A between-subjects experiment ( $N = 30$ ) compared NeuroWise (n = 15) to a baseline (n = 15) consisting of standard GPT-4o-mini simulation without support panel features. The scenario comprised 8–12 turn home-based dialogues, with users greeted Alex with Thai food, thereby disrupting routine and sensory expectations.

Pre-survey measures:

8-item double-empathy scale
Autism knowledge quiz

Post-survey measures:

Deficit-based attribution score $D \in [1,7]$ , as mean of two reverse-scored items, $\alpha = 0.84$
Communication flexibility change $\Delta F$
Conversation length $T$
Perceived learning and feature helpfulness ratings

Statistical analyses:

Mann–Whitney U for between-group
Wilcoxon signed-rank for within-group
Effect size: Cliff’s $\delta = \frac{\text{favorable} - \text{unfavorable pairs}}{n_1 n_2}$

Key outcome summary:

Metric	NeuroWise	Baseline	Test Statistic & Effect
Δ Deficit attribution, mean (ΔD)	-0.63 (improved)	+0.30 (regressed)	U=57.0, p=.020, δ=–0.49
Conversation length, median (T)	8.0	11.0	U=59.0, p=.030, δ=–0.48
Final stress ( $S_T$ )	n.s. difference	n.s. difference	p = .47
Flexibility change ( $\Delta F$ )	Significant	n.s.	Wilcoxon p=.026

These findings indicate that NeuroWise induces both a decrease in deficit-based framing and increased conversational efficiency, without additional conversational stress (Tang et al., 21 Feb 2026).

5. Interpretation and Implications

Embedding interpretive feedback (A₃), stress visualization, and adaptive coaching in a glass-box framework led NeuroWise users to reconceive communication difficulties as mutual and situational. The condition-by-time interaction for deficit attributions (U=57.0, p=.020, δ=–0.49) is interpreted as evidence that neurotypical users, with NeuroWise’s explicit support, avoid the post-conversational regression toward deficit framing observed in baseline chatbot interaction. Additionally, participants completed conversations with 37% fewer turns without compromising resolution, indicating improved efficiency and engagement. Communication flexibility also showed statistically significant improvement.

A plausible implication is that transparent AI-driven scaffolding can help users internalize a double empathy approach, positioning the source of communicative breakdowns as relational rather than dispositional.

6. Limitations and Prospective Directions

Several constraints limit the generality of current NeuroWise findings:

Experimental scope is limited to a single, fully-scripted scenario with simulated, not real autistic partners.
Measured effects are immediate only; durability or long-term behavioral transfer is untested.

Future work proposed includes:

Extending evaluation to real neurotypical–autistic dyads over multiple sessions
Involving autistic collaborators in the design and validation of Interpreter outputs and partner profiles
Personalization of the simulated partner across varied sensory and communicative profiles within the spectrum
Development of bidirectional glass-box systems scaffolding both parties in interpreting each other’s cues

These directions are deemed necessary for generalizability and for the ecological validity of double empathy interventions (Tang et al., 21 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

NeuroWise: A Multi-Agent LLM "Glass-Box" System for Practicing Double-Empathy Communication with Autistic Partners (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NeuroWise.