NeuroWise: LLM Double Empathy System
- NeuroWise is a multi-agent LLM system that simulates autistic communication to foster double empathy by exposing internal stress and offering adaptive coaching.
- Its architecture comprises four LLM-driven agents that simulate partner behavior, estimate stress levels, interpret internal states, and coach conversational responses using a transparent glass-box framework.
- Empirical evaluations show NeuroWise reduces deficit-based attributions and conversation turns, thereby enhancing communication efficiency and mutual understanding.
NeuroWise is a web-based, multi-agent LLM system designed to enable neurotypical users to practice “double-empathy” communication with a simulated autistic partner. Unlike deficit-oriented approaches focusing on autistic individuals, NeuroWise operationalizes the double empathy framework by supporting neurotypical users with stress visualization, model-based interpretations of internal experiences, and contextually adaptive coaching, all using a transparent ("glass-box") architecture (Tang et al., 21 Feb 2026).
1. Theoretical Framework and Rationale
NeuroWise is motivated by the double empathy problem, which posits that communicative disconnects between neurodivergent (especially autistic) and neurotypical individuals originate from reciprocal misunderstandings, not one-sided deficiencies. Conventional interventions tend to pathologize autistic cognition, reinforcing deficit-based attributions. NeuroWise foregrounds mutual sense-making by simulating neurodivergent communication and revealing the internal state, perspective, and stress levels of the autistic conversational partner. This approach aims to support neurotypical users in interpreting behavioral signals as contextually understandable rather than indicative of personal deficits (Tang et al., 21 Feb 2026).
2. System Architecture and “Glass-Box” Approach
The NeuroWise system orchestrates four discrete LLM-driven agents in a turn-based pipeline, denoted as A = {A₁, A₂, A₃, A₄}:
- A₁ (“Partner”) simulates the autistic partner (“Alex”) and produces verbal responses conditioned on dialogue history and current stress.
- A₂ (“Stress Estimator”) classifies each user message into a communication category with , then assigns a stress increment based on a lookup (e.g., , ).
- A₃ (“Interpreter”) generates natural-language explanations for Alex’s internal experience after stress increases, supporting perspective-taking.
- A₄ (“Coach”) issues concrete, context-sensitive guidance to scaffold the user’s next move, emphasizing neurodiversity-affirming strategies.
State updates follow
with , where restricts the value to .
All agent outputs are persistently displayed in a “Support Panel,” comprising:
- A real-time Stress Bar showing
- Interpreter explanations
- Coach suggestions
This glass-box design exposes the inferential process of the system, offering traceability of internal state transitions and recommendations (Tang et al., 21 Feb 2026).
3. Interaction Techniques and Classification Pipeline
Interaction is mediated by an LLM-based pipeline in which the user’s utterances are classified using GPT-4o-mini. Stress estimation is grounded in the categorization result and rule-based mapping. The Interpreter and Coach are triggered whenever the stress increment exceeds a threshold (here, ), prompting the LLM to hypothesize about Alex’s cognitive/emotional state and recommend validation and sensory-accommodative strategies.
Empirical validation showed:
- Inter-rater reliability on 15 scripted dialogues (63 turns): ICC = 0.86, 95% CI [0.77, 0.91]
- Strong LLM–human classification correlation: ,
- High discriminant validity between stress scenarios: Cohen’s
This suggests that the stress estimator’s classification aligns robustly with human judgments and that the designed interaction scaffolds are reliably triggered under salient communication breakdowns (Tang et al., 21 Feb 2026).
4. Experimental Design and Evaluation Metrics
A between-subjects experiment () compared NeuroWise (n = 15) to a baseline (n = 15) consisting of standard GPT-4o-mini simulation without support panel features. The scenario comprised 8–12 turn home-based dialogues, with users greeted Alex with Thai food, thereby disrupting routine and sensory expectations.
Pre-survey measures:
- 8-item double-empathy scale
- Autism knowledge quiz
Post-survey measures:
- Deficit-based attribution score , as mean of two reverse-scored items,
- Communication flexibility change
- Conversation length
- Perceived learning and feature helpfulness ratings
Statistical analyses:
- Mann–Whitney U for between-group
- Wilcoxon signed-rank for within-group
- Effect size: Cliff’s
Key outcome summary:
| Metric | NeuroWise | Baseline | Test Statistic & Effect |
|---|---|---|---|
| Δ Deficit attribution, mean (ΔD) | -0.63 (improved) | +0.30 (regressed) | U=57.0, p=.020, δ=–0.49 |
| Conversation length, median (T) | 8.0 | 11.0 | U=59.0, p=.030, δ=–0.48 |
| Final stress () | n.s. difference | n.s. difference | p = .47 |
| Flexibility change () | Significant | n.s. | Wilcoxon p=.026 |
These findings indicate that NeuroWise induces both a decrease in deficit-based framing and increased conversational efficiency, without additional conversational stress (Tang et al., 21 Feb 2026).
5. Interpretation and Implications
Embedding interpretive feedback (A₃), stress visualization, and adaptive coaching in a glass-box framework led NeuroWise users to reconceive communication difficulties as mutual and situational. The condition-by-time interaction for deficit attributions (U=57.0, p=.020, δ=–0.49) is interpreted as evidence that neurotypical users, with NeuroWise’s explicit support, avoid the post-conversational regression toward deficit framing observed in baseline chatbot interaction. Additionally, participants completed conversations with 37% fewer turns without compromising resolution, indicating improved efficiency and engagement. Communication flexibility also showed statistically significant improvement.
A plausible implication is that transparent AI-driven scaffolding can help users internalize a double empathy approach, positioning the source of communicative breakdowns as relational rather than dispositional.
6. Limitations and Prospective Directions
Several constraints limit the generality of current NeuroWise findings:
- Experimental scope is limited to a single, fully-scripted scenario with simulated, not real autistic partners.
- Measured effects are immediate only; durability or long-term behavioral transfer is untested.
Future work proposed includes:
- Extending evaluation to real neurotypical–autistic dyads over multiple sessions
- Involving autistic collaborators in the design and validation of Interpreter outputs and partner profiles
- Personalization of the simulated partner across varied sensory and communicative profiles within the spectrum
- Development of bidirectional glass-box systems scaffolding both parties in interpreting each other’s cues
These directions are deemed necessary for generalizability and for the ecological validity of double empathy interventions (Tang et al., 21 Feb 2026).