XAI Question Bank Overview
- XAI Question Bank is a systematic collection of prototypical questions that capture user information needs and guide explainable AI design.
- It organizes explanation requirements into categories that map directly to technical methods like SHAP, LIME, and counterfactual analysis.
- It supports benchmark development, regulatory compliance, and iterative refinement based on real-world user interactions and targeted UX research.
Explainable AI (XAI) Question Bank
An Explainable AI (XAI) question bank is a systematic set of prototypical user questions that operationalizes the information needs arising in AI system explanation, evaluation, and design. These question banks enable structured requirements elicitation for XAI system design, support benchmark development for evaluation of explanation methods, and serve as a mapping layer between end-user demands and available explanation techniques. This concept has evolved through both academic interrogation of user-AI interaction and technical evaluation practices and underpins a rapidly expanding area of XAI research, tool design, and standardization.
1. Origins and Core Taxonomy
The XAI Question Bank concept was originally articulated by Liao et al. through interviews with AI UX/design practitioners and a structured literature review (Liao et al., 2020, Sipos et al., 2023). The canonical bank organizes explanation needs into top-level categories, each corresponding to a recurring intelligibility type:
| Category | Leading Question Example |
|---|---|
| Input/Data | What data did you train the model on? |
| Output | What exactly is the model predicting for me? |
| Performance | How accurate is the model overall? |
| How (Global Logic) | How does the model make decisions in general? |
| Why (Local) | Why did the model give me this particular prediction? |
| Why Not (Contrast) | Why not the outcome I expected? |
| What If | What if [feature X] were different? |
| How to Be That | How would I need to change the input for a target outcome? |
| How to Still Be This | What changes can I make without changing this prediction? |
| Others | How has the model changed since last month? What does “feature importance” mean? |
Each category includes further specific prototypical questions, many of which have been extended to domains such as UI affordances, data provenance, system drift, or governance (Sipos et al., 2023).
2. Methodologies for Construction and Validation
Question banks are formulated through iterative synthesis of literature taxonomies, domain-specific fieldwork, and empirical gap analyses. Liao et al. (Liao et al., 2020, Liao et al., 2021) constructed the original bank via UX practitioner interviews, prioritization workshops, and analysis of real-world design artifacts. Sipos et al. (Sipos et al., 2023) extended the bank by coding think-aloud sessions of domain experts interacting with applied AI—introducing, for example, new categories for user-parameter questions, UI element explanations, and accountability queries.
Validation consists of mapping utterances to existing question codes, iterative refinement of ambiguous categories, and continual expansion in response to observed user needs and missed coverage. Evidence from domain adaptation (e.g., visual retrieval in museology (Sipos et al., 2023), financial modeling (Kuiper et al., 2021)) demonstrates robust generalizability but also reveals the necessity of context-specific extensions.
3. Application: Mapping Questions to XAI Methods
Each XAI question type corresponds to a (possibly multi-valued) mapping onto technical explanation methods or artifacts, forming the basis for modular system design, evaluative benchmarking, and user-oriented interfaces (Liao et al., 2021, Nguyen et al., 2022).
| Question Type | Example Explanation Method |
|---|---|
| Why | Local feature importance (SHAP, LIME, Integrated Gradients); bar charts |
| What If | Partial dependence plots, counterfactual generators (DiCE) |
| How (Global) | Global feature ranking (SHAP), rule extraction, surrogate trees |
| Performance | Accuracy metrics, error bands, subgroup performance dashboard |
| Input/Data | Datasheets for Datasets, Model Cards |
| How to be that/still be this | Counterfactuals, robustness intervals, “anchors” |
| Why Not | Contrastive counterfactual search, comparison explanations |
The mapping often incorporates additional layers for process context (governance, audit trails), system-level transparency (pipeline design, feedback mechanisms), and human factors (terminology, testimonials) as detailed in sectoral adaptations (e.g., finance (Kuiper et al., 2021)) and conversational agents (Nguyen et al., 2022).
4. Extension, Gap Analysis, and End-User Centering
Systematic application of the question bank in end-user contexts surfaces gaps not anticipated in initial developer-centric taxonomies. Sipos et al. (Sipos et al., 2023) and Liao et al. (Liao et al., 2020) highlight recurring new needs:
- UI-specific: “What does this button do?”, “What are the system’s affordances?”
- Output scope/amount: “What is the coverage of this recommendation?”, “How many items are shown?”
- Dynamic behavior: “How does the system change over time?”, “Does it learn from my interactions?”
- Context/Accountability: “Who built this system?”, “Where does the data come from?”
Practical integration mandates collaborative workshops to tune definitions to context, empirical coding with iterative category refinement, and design checklists for coverage mapping and implementation tracking (Sipos et al., 2023).
5. Benchmarking and Evaluation via Question Banks
Recent work proposes leveraging question banks as scaffolds for the quantitative evaluation of XAI methods. Compare-xAI (Belaid et al., 2022) offers a meta-benchmark grounded in functional requirements, with each test in the suite tied to an end-user question or bug category (e.g., “Does this method distinguish dummy features?” “Is it robust to adversarial attacks?”). Test selection enforces non-redundancy and coverage over five meta-categories: fidelity, fragility, stability, simplicity, and stress, producing interpretable multi-level scores for each explanation algorithm.
CLEVR-XAI (Arras et al., 2020) demonstrates the utility of a question-driven, ground-truth–controllable evaluation paradigm in the vision domain. Here, each VQA task/question directly defines the relevant subset of objects, enabling precise mass and rank accuracy metrics for heatmap explanations and revealing systematic weaknesses in commonly used methods (e.g., Grad-CAM’s failure on VQA selectivity).
6. Deployment in Conversational and Regulatory Settings
Conversational agents built on XAI question banks map a broad grammar of paraphrased user queries to canonical explanation types and invoke suitable methods automatically (Nguyen et al., 2022). This architecture supports natural dialogue, robust NLU, and dynamic selection amongst methods such as SHAP, LIME, Anchors, or DICE, tailored to both tabular and vision domains, with placeholder-based template NLG for consistent responses.
In highly regulated sectors, such as finance, XAI question banks serve to delineate requirements between technical “model explainability” (feature attributions, error scores) and broader “system explainability” (data lineage, governance documentation), supporting alignment between regulated entities and supervisory authorities, and forming the basis for formalized question sets in compliance auditing (Kuiper et al., 2021).
7. Limitations, Challenges, and Future Research
Key limitations of the XAI question bank paradigm include:
- Ambiguous category boundaries, e.g., “How” vs. “What is overall logic” (Sipos et al., 2023).
- Necessity for iteration as new usage scenarios and stakeholder roles emerge.
- Risk of checklist-driven rather than needs-driven design if not rooted in user research.
- Technical challenges in mapping questions to explanation methods with rigorous guarantees, especially in domains with weak alignment between modeled and user-desired concepts.
Open directions include integrating causal inference–based explanations (as in H-XAI (Lakkaraju et al., 7 Aug 2025)), supporting adaptive multi-turn dialogue grounded in question-driven reasoning, and extending benchmarks to more diverse modalities, legal/regulatory contexts, and adversarial robustness scenarios.
8. Representative Examples from XAI Question Banks
The following table lists illustrative question–method pairs synthesized across multiple XAI question bank deployments as documented in (Liao et al., 2020, Sipos et al., 2023, Liao et al., 2021, Kuiper et al., 2021), and (Nguyen et al., 2022):
| Question | Method(s) (Partial List) |
|---|---|
| Why was my loan rejected? | SHAP local, logistic weights |
| Which features contributed most to approval? | SHAP bar chart, summary plot |
| What minimal change flips decision? | Counterfactual optimizer (DiCE, CEM) |
| How does changing feature X affect risk? | Partial dependence plot, ICE |
| How accurate is the model overall and by group? | Dashboard, confusion matrix |
| Why not class Y? | Contrastive explanation, DICE |
| What data is model trained on? | Data Card, lineage summary |
| What does this UI widget do? | Tooltip, inline doc |
| Who built this system? | System documentation |
By structuring system explanation around canonical questions and method mappings, XAI question banks enable reproducible, stakeholder-aligned, and adaptable explainability in both research and deployment contexts.