Potentially Applicable Explanations (PAE)
- Potentially Applicable Explanations (PAE) are structured, actionable AI explanations combining cognitive, semantic, and formal principles for use in human-centric workflows.
- They integrate appraisal scoring, ontology mapping, or logic-based rules to transform model decisions into domain-relevant, clear justifications.
- PAEs enable safe deployment in complex systems by providing interpretable, verifiable, and context-sensitive explanations that improve decision-making.
A Potentially Applicable Explanation (PAE) is a class of model explanations characterized by a strong alignment with cognitive, semantic, or formal principles that support their direct usability in downstream human-centered or scientific workflows. PAEs are distinguished by being both interpretable and actionable: they use domain-meaningful concepts, structures, or dimensions and offer a degree of formal or practical fidelity that makes them “deployable” in real-world AI systems, safety-critical supervision, or scientific investigation. PAEs have emerged as a unifying theme across recent research in explainable AI, incorporating cognitive appraisal models, ontology-based semantic reasoning, and logic-based extraction techniques (Somarathna et al., 2 Aug 2025, Perdih et al., 2021, Barbiero et al., 2021).
1. Formal Definitions and Theoretical Foundations
The PAE notion is instantiated differently across paradigms, but always with a formal scaffold:
- In the appraisal-based framework, a PAE is a structured justification for a system decision on input , constructed by identifying and combining appraisal “primitives” (dimensions such as relevance, implications, coping potential, normative significance) (Somarathna et al., 2 Aug 2025). For each dimension , a scoring function produces appraisal scores , from which the dominant dimensions are selected to synthesize a composite explanation.
- In the context of semantic post-hoc summarization, a PAE is a compact, human-readable conjunction of ontology terms that covers, generalizes, and discriminates class-relevant feature attributions. This construction relies on the theory of least general generalization in ontological spaces and information-theoretic criteria (e.g., GenQ) for quality (Perdih et al., 2021).
- In concept-based neural models with logic extraction, a PAE takes the form of first-order logic (FOL) predicates over interpretable concepts, distilled from model weights via an entropy-guided criterion. Each formula is required to be formally equivalent (or nearly so) to the classifier’s behavior on the Boolean concept activation space (Barbiero et al., 2021).
These approaches share a commitment to explicit mappings between machine representations and human cognition or expertise, ensuring that explanations are not mere technical rationalizations, but are in principle usable by humans in practice.
2. Key Dimensions and Construction Algorithms
Each research approach to PAE construction is grounded in explicit dimensions or foundations:
| Approach | Underlying Structure | Output Format |
|---|---|---|
| Appraisal-based (Somarathna et al., 2 Aug 2025) | Appraisal dimensions (CPM) | Natural language fragments |
| Semantic reasoning (Perdih et al., 2021) | Ontological hierarchy, GenQ | Conjunction of ontology terms |
| Logic extraction (Barbiero et al., 2021) | Boolean concept algebra, entropy | First-order logic formulas |
- Appraisal Dimensions: Derived from the Component Process Model (CPM), core dimensions include relevance (semantic goal alignment), implications (loss-based consequence), coping potential (controllability), and normative significance (compliance with rules or values). Each is computed from input and decision , with concrete scoring functions—e.g., cosine similarity between user goal embeddings and item features for relevance; normalized cost differentials for implications (Somarathna et al., 2 Aug 2025).
- Semantic Generalization: Post-hoc instance-level attributions (e.g., via SHAP) are mapped through a feature-to-ontology concept mapping, and then generalized to cover-class and discriminate-class conjunctions using “Selective Staircase” or “Ancestry” algorithms. Explanation quality is quantified by GenQ (normalized reduction in average information content), and discrimination is enforced by overlap constraints (Perdih et al., 2021).
- Logic Derivation: In concept-based networks, a per-class entropy-based regularizer forces the network to concentrate output logic on a minimal subset of concepts. The trained classification head is binarized, and empirical truth tables yield disjunctive normal form (DNF) FOL rules that are short, auditable, and directly executable at test time (Barbiero et al., 2021).
3. Construction Workflows and Pseudocode
The synthesis of PAEs proceeds via a modular, often algorithmic, workflow:
- Appraisal PAE Synthesis:
- For each appraisal dimension , compute raw score .
- Normalize using weights , yielding .
- Rank dimensions by , select top .
- Instantiate templated fragments for those dimensions, incorporating and context.
- Concatenate fragments for the final explanation.
- Return the composed PAE (Somarathna et al., 2 Aug 2025).
- Semantic Reasoning (ReEx) Pipeline:
- Aggregate instance-level post-hoc feature attributions (e.g., SHAP).
- Map top features to ontology via .
- Iteratively generalize and prune using Selective Staircase or pairwise Ancestry algorithms, guided by coverage, discriminability, and informativeness.
- Output minimal, context-specific explanation sets (Perdih et al., 2021).
- Entropy-Based Logic Extraction:
- Train classifier with additional entropy regularizer to induce peaked concept weight distributions.
- After training, threshold high-importance concepts.
- For each output/class, construct empirical DNF from binary masks over activation space.
- Optionally simplify formulas via logic minimization (Barbiero et al., 2021).
4. Illustrative Case Studies
Recent literature demonstrates PAE construction in diverse domains:
- Appraisal-based Example: In a meal recommendation scenario, user input specifies constraints (“hungry, in a hurry, 15 minutes”). The system decision is “Grilled Chicken & Quinoa Salad.” Appraisal scores yield highest values for normative significance, urgency, and relevance. The PAE: “I recommend Grilled Chicken & Quinoa Salad because it takes only about 12 minutes to prepare (urgency: 0.95), aligns closely with your health goals (relevance: 0.88), and conforms to your dietary standards (normative significance: 1.00)” (Somarathna et al., 2 Aug 2025).
- Semantic Reasoning Example: In multi-class gene expression classification (Breast A data), raw top gene attributions are mapped to GO terms (“DNA repair,” “ubiquitin protein ligase binding,” etc.). Selective Staircase generalizes these to a compact conjunction of terms describing mechanisms for each subtype, immediately interpretable by molecular biologists (Perdih et al., 2021).
- Logic Extraction Example: For ICU mortality prediction (MIMIC-II), the extracted FOL rule: , isolates a minimal clinical signature for likely recovery (Barbiero et al., 2021).
5. Cognitive and Domain Alignment
The central property of PAEs is their strong cognitive and/or domain alignment:
- Appraisal-based PAEs are explicitly constructed to mirror human evaluative steps grounded in psychological theory, rendering the explanations both context-sensitive and naturally intelligible (Somarathna et al., 2 Aug 2025).
- Semantic PAEs, by construction, abstract raw model features to higher order domain concepts via ontological reasoning, making them immediately actionable in biological or clinical interpretation, and avoiding domain-opaque technical details (Perdih et al., 2021).
- Logic-based PAEs offer formal, sparse, and verifiable rules that expose the classifier’s reliance on interpretable concepts, facilitating domain expert audit and regulatory compliance (Barbiero et al., 2021).
This alignment is crucial for high-stakes applications where technical fidelity alone is insufficient: user trust, oversight, and meaningful downstream adaptation require explanations that “make sense” within a given domain or cognitive frame.
6. Empirical Evaluation and Limitations
Published works report evidence on the usability, compactness, and informativeness of PAEs:
- Appraisal-condition explanations increase perceived transparency and trust in qualitative pilot feedback, and are judged more context-aware and emotionally resonant than LLM-generated baselines. However, full-scale quantitative evaluation and task-level A/B testing remain to be pursued (Somarathna et al., 2 Aug 2025).
- Semantic reasoning approaches demonstrate that explanation sets from ReEx models are smaller (e.g., 20–40 ontology terms vs. 150 for naïve mapping), more general (GenQ gain +10–26%), and class-discriminative, validated across multiple gene-expression data sets (Perdih et al., 2021).
- Entropy-based logic extraction methods produce FOL explanations with high fidelity (>90% on held-out clinical data), of minimal length (typically 3–5 literals), and are straightforward for domain experts to understand, edit, and formally verify (Barbiero et al., 2021).
Limitations include dependence on the quality of concept vocabularies, ontologies, or mappings; subjective thresholding or weighting hyperparameters; possible loss of technical fidelity for the sake of interpretability; and the lack of large-scale user studies in some cases.
7. Prospects and Extensions
PAE methodologies are readily extensible to other domains (e.g., cyber-security, financial risk, clinical terminologies) and support hybridization (e.g., combining symbolic reasoning with graph embeddings). Open challenges include automating hyperparameter selection, expanding to non-DAG knowledge graphs, and quantifying downstream human-in-the-loop utility. As the uptake of PAEs grows, rigorous frameworks for evaluation and formal verification are expected to co-evolve with their application in regulatory and human-centered AI settings (Somarathna et al., 2 Aug 2025, Perdih et al., 2021, Barbiero et al., 2021).