Explanation Satisfaction Scale

Updated 8 June 2026

Explanation Satisfaction Scale is a user-centered measure that quantifies how well users understand AI systems, explanations, and interfaces.
It utilizes diverse instruments from single-item numeric ratings to multi-item Likert batteries, capturing facets like feasibility, trust, and completeness.
Empirical findings highlight that tailored scale design and rigorous psychometric validation enhance the reliability and interpretability of user satisfaction assessments.

Explanation Satisfaction Scale refers to a family of user-centered evaluation instruments and criteria developed to quantify the degree to which users feel they understand AI systems, their explanations, and associated user interfaces. These scales serve as subjective outcome measures in explainable AI (XAI) research, particularly to assess users’ mental models, perceived clarity, and suitability of explanations for intended tasks. Although widely adopted, substantial heterogeneity exists in measurement approaches—ranging from ad-hoc single-item scores to rigorously validated multi-item Likert batteries—reflecting disciplinary nuances and task demands. Explanation satisfaction is recognized as a core criterion within broader models of explanation quality but is distinguished from more objective, task-based constructs such as appropriate trust.

1. Formal Definitions and Theoretical Position

The term “Explanation Satisfaction” is defined in the quality evaluation literature as “the degree of how much the users feel they understand the system, the explanations, and the user interface” (Löfström et al., 2022). This definition collectively subsumes related constructs such as explanation goodness, comprehensibility, causability, and the System Causability Scale. In practice, explanation satisfaction operationalizes the subjective impact of an explanation on a user’s mental model, reflecting how well an explainable user interface, algorithmic output, or explanation is perceived as “suitable for the intended purpose” (Löfström et al., 2022).

Within comprehensive models of explanation quality, explanation satisfaction sits squarely in the “user aspect” as a subjective mental-model outcome. It is contrasted with “appropriate trust,” which quantifies a user’s ability to discriminate correct from incorrect system outputs and act accordingly (Löfström et al., 2022). Whereas appropriate trust can be evaluated as an objective task-performance metric, explanation satisfaction remains an intrinsically subjective, introspective measure and, by widely held consensus, lacks firm, generalizable thresholds for acceptability.

2. Measurement Instruments: Scales and Formats

Measurement approaches for explanation satisfaction range from single-item, face-valid ad-hoc questions to multi-item, psychometrically validated scales.

Single-Item Scales:

Recent studies, such as Kaufman et al. (Kaufman et al., 2024), assess explanation satisfaction in situ using a single numeric item:

Item: “How satisfied are you with the AV’s explanation?”
Response format: 0–10 slider (0 = “not at all satisfied,” 10 = “completely satisfied”)
Scoring: Raw scenario-level rating, optionally averaged across scenarios for aggregate analysis

Such single-item measures are direct but lack reliability and factorial validity assessments, and are typically justified for scenario-specific, high-throughput user studies.

Multi-Item Likert Scales:

In-depth studies (e.g., Domnich et al. (Domnich et al., 7 Apr 2025)) employ multi-dimensional batteries, such as the CounterEval “Explanation Satisfaction Scale,” which incorporates both overall satisfaction and specific explanatory virtues:

Overall satisfaction: “Overall, I am satisfied with this explanation” (1–6 scale)
Explanatory criteria: Measured individually on 6-point agreement or 5-point complexity scales, covering feasibility (“The suggested changes seem realistic and actionable in this context”), coherence, complexity, understandability, completeness, fairness, and trust.

Typical items, adopted or recommended in the literature, include:

Construct	Example Item	Scale
Overall Satisfaction	“Overall, I am satisfied with this explanation.”	1–6 Likert
Feasibility	“The suggested changes seem realistic and actionable in this context.”	1–6 Likert
Trust	“I believe that if I followed these suggested changes, they would succeed.”	1–6 Likert
Understandability	“I feel like I understood the phrasing of the explanation well.”	1–6 Likert
Complexity	“The explanation was too simple/too complex/just right.”	–2 to +2

Multi-item scales enable calculation of internal-consistency reliability (e.g., Cronbach’s α), factor analytic exploration, and disentangling of explanatory dimensions driving user satisfaction (Domnich et al., 7 Apr 2025).

3. Psychometric Properties and Factor Structure

When validated, explanation satisfaction scales generally achieve high sampling adequacy and clear factor structures:

The Domnich et al. (Domnich et al., 7 Apr 2025) scale achieved Kaiser–Meyer–Olkin = 0.893 and three-component factor structure (scree-plot “elbow” after three).
The first factor explained 40.5% of variance, with strong loadings for feasibility (0.7805), trust (0.7896), consistency (0.7697), completeness (0.6273), and fairness (0.6884). Understandability and complexity loaded on separate factors.
The regression model for overall satisfaction was:

$S = 0.1766 + 0.3581 \cdot \text{Feasibility} + 0.0665 \cdot \text{Consistency} + 0.1702 \cdot \text{Completeness} + 0.3618 \cdot \text{Trust} - 0.0690 \cdot \text{Understandability} + 0.0170 \cdot \text{Fairness} + 0.0802 \cdot \text{Complexity} + \epsilon$

explaining $R^2 = 0.757 (\pm 0.008)$ of the variance (Domnich et al., 7 Apr 2025).

Feasibility and trust consistently emerged as the strongest drivers, while completeness provided a secondary boost. Understandability had a small negative coefficient in the presence of other predictors, and fairness’s contribution was marginal.

This structure suggests that users’ overall satisfaction is determined by an intertwined set of explanatory virtues, with actionability and trustworthiness as primary pillars. The presence of stable factor structures across samples and domains supports the psychometric robustness of these instruments.

4. Application Contexts and Empirical Findings

Explanation satisfaction instruments have been deployed in a range of domains:

Autonomous Vehicles:

In simulated driving studies, explanation satisfaction has been shown to be highly sensitive to explanation errors. Kaufman et al. (Kaufman et al., 2024) report that mean satisfaction scores sharply decreased with increasing explanation errors:

Accurate explanations: $M = 6.38$
Low-error (“what” correct, “why” incorrect): $M = 3.13$
High-error (“what” + “why” incorrect): $M = 2.20$

Linear mixed-effect modeling confirmed highly significant decrements per error-severity increment (all $p < .001$ ). Contextual factors such as scenario harm and driving difficulty further amplified these effects.

Counterfactual Explanations:

In evaluations of counterfactual XAI methods, Domnich et al. (Domnich et al., 7 Apr 2025) demonstrated that, in addition to feasibility and trust, completeness and consistency provided meaningful contributions to satisfaction. Complexity appeared psychometrically separable and did not consistently penalize satisfaction, indicating that length or detail, when mapped appropriately to user expertise, need not be detrimental.

Demographic analyses showed ML and medical-expert participants applied more stringent standards, suggesting the necessity of tailoring explanation designs to user profiles.

5. Comparative Role and Limitations

Explanation satisfaction, while widely reported (cited in 10 of 14 major XAI evaluation surveys), is noted for its subjective, introspective nature (Löfström et al., 2022). Head-to-head comparative evaluations across explanation methods generally lack consensual cut-points or thresholds for acceptability. As such, the literature recommends complementing satisfaction metrics with objective, task-based measures such as appropriate trust, especially in comparative studies, to mitigate ceiling/floor effects and possible demand characteristics in subjective ratings.

A plausible implication is that, although satisfaction remains indispensable for user-centered design iterations and post-hoc usability assessment, researchers are urged to:

Pilot multi-item scales and report standard psychometrics (e.g., Cronbach’s α, item-total correlations)
Pair subjective satisfaction measures with objective behavioral tasks (e.g., error detection, rejection/acceptance of system outputs)
Account for domain, scenario, and expertise effects through stratified or customized battery development

6. Recommendations for Scale Development and Best Practices

Authors surveying the XAI literature recommend that future scale development around explanation satisfaction adhere to the following guidelines (Löfström et al., 2022, Domnich et al., 7 Apr 2025):

Adopt clear construct definitions: satisfaction should refer to perceived user understanding of system, explanation, and interface.
Use or adapt multi-item Likert scales (5–7 points), with items addressing interpretability, helpfulness, confidence, and overall satisfaction.
Report psychometrics: especially internal consistency metrics and factor analytic support; pilot scale items to ensure sampling adequacy and structural validity.
In multi-method comparisons, combine subjective satisfaction with objective measures of trust or performance.
Tailor item content and response anchors to the expertise and needs of the target user population.

These practices ensure that explanation satisfaction scales retain content validity, reliability, and interpretative value across diverse XAI contexts and user groups.

References

[A Meta Survey of Quality Evaluation Criteria in Explanation Methods, (Löfström et al., 2022)]
[What Did My Car Say? ... On Comfort, Reliance, Satisfaction, and Driving Confidence, (Kaufman et al., 2024)]
[Predicting Satisfaction of Counterfactual Explanations ..., (Domnich et al., 7 Apr 2025)]

Markdown Report Issue Upgrade to Chat

References (3)

A Meta Survey of Quality Evaluation Criteria in Explanation Methods (2022)

What Did My Car Say? Impact of Autonomous Vehicle Explanation Errors and Driving Context On Comfort, Reliance, Satisfaction, and Driving Confidence (2024)

Predicting Satisfaction of Counterfactual Explanations from Human Ratings of Explanatory Qualities (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Explanation Satisfaction Scale.

Explanation Satisfaction Scale

1. Formal Definitions and Theoretical Position

2. Measurement Instruments: Scales and Formats

3. Psychometric Properties and Factor Structure

4. Application Contexts and Empirical Findings

5. Comparative Role and Limitations

6. Recommendations for Scale Development and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Explanation Satisfaction Scale

1. Formal Definitions and Theoretical Position

2. Measurement Instruments: Scales and Formats

3. Psychometric Properties and Factor Structure

4. Application Contexts and Empirical Findings

5. Comparative Role and Limitations

6. Recommendations for Scale Development and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research