Controllable vs Uncontrollable Failure Attribution
- Controllable versus uncontrollable failure attribution distinguishes failures that can be mitigated through feasible interventions from those driven by immutable or exogenous factors.
- The A2P and CAFA frameworks employ causal inference and counterfactual reasoning to pinpoint actionable inputs and optimize intervention policies in complex systems.
- Empirical evaluations demonstrate enhanced accuracy and transparency in identifying and addressing failures, informing effective decision-making in fields like healthcare and multi-agent systems.
Controllable versus uncontrollable failure attribution refers to the systematic distinction between failures that could be prevented or mitigated via feasible interventions, and those that derive from factors fundamentally outside the scope of intervention. This distinction is central to debugging complex systems, interpreting predictive models in high-stakes domains such as medicine and public health, and operationalizing causal reasoning over multi-agent interactions. Methodologically, attributions of failure or risk can be partitioned along the axis of controllability, guiding both research on causal inference and the design of actionable, transparent decision-support tools.
1. Formal Definitions and Conceptual Foundations
In automated decision processes, machine learning, and multi-agent systems, the notion of controllability underpins the ability to identify where and how a system’s undesirable outcome could have been avoided via feasible changes. Two primary categorizations emerge:
- Controllable failures: Adverse outcomes that would have been averted by a feasible, localized intervention (e.g., correcting an action, changing a policy, or modifying a controllable input).
- Uncontrollable failures: Outcomes impervious to any such localized or practical intervention—arising from immutable features (age, genetic background), exogenous noise, or factors outside the modeled environment.
In feature attribution contexts, controllability demarcates the set of input features that an actor or policy-maker can in principle adjust (controllable features, ) from those which cannot be changed in practice (uncontrollable features, ) (Kovvuri et al., 2022). In causal sequential decision settings, controllability is assessed by whether single-step (or, more generally, minimal) interventions suffice to avert failure as verified through counterfactual reasoning (West et al., 12 Sep 2025).
2. Causal Frameworks and Mathematical Formalism
Multi-Agent Trajectory Models
In multi-agent systems, a trajectory is defined as
where is the global state at time and is the action taken. State transitions follow the structural equation
with drawn from a distribution of unobserved exogenous factors (knowledge gaps, misperceptions, etc.).
The outcome function indicates success or failure, . Interventions are formalized using Pearl’s do-operator: the trajectory after an intervention at time 0 is denoted 1, and system evolution proceeds as
2
Feature Attribution Models
Given an input vector 3 with index set 4, controllable and uncontrollable features are defined respectively as those variables which can and cannot be altered by intervention. Traditional model-agnostic attributions (e.g., SHAP, LIME) do not distinguish between these sets, treating all inputs uniformly (Kovvuri et al., 2022).
3. Methodological Approaches for Attribution
Abduct–Act–Predict (A2P) Scaffolding
A2P is a causal inference framework for failure attribution in multi-agent conversational systems (West et al., 12 Sep 2025), operationalized as follows:
- Abduction: For a candidate failure step 5, infer the most plausible hidden cause 6 via posterior estimation:
7
- Action: Identify a minimal corrective intervention 8 targeting 9:
0
where 1 is an action-distance metric.
- Prediction: Simulate the outcome trajectory post-intervention, estimate 2. If 3, declare the failure at 4 controllable.
If no intervention at any 5 achieves 6, the failure is deemed uncontrollable.
Controllable fActor Feature Attribution (CAFA)
CAFA partitions input features into 7 and 8 and produces an attribution vector 9 with the key constraint 0 for all 1, reflecting only the influence of the controllable features (Kovvuri et al., 2022). CAFA proceeds through:
- Selective perturbation: Only features in 2 are perturbed around the input 3 within a threshold 4, with 5 fixed for all 6.
- Surrogate modeling: Fit a high-capacity surrogate 7 (e.g., random forest) on the perturbed dataset 8.
- Global-for-local SHAP explanation: Compute attributions using the SHAP method on 9, guaranteeing uncontrollable features are assigned zero attribution due to invariance in 0.
4. Empirical Evaluation and Case Studies
Multi-Agent Failure Attribution
On the WhoWhen benchmark for multi-agent dialogue failures, A2P achieves a step-level failure attribution accuracy of 1 on the Algorithm-Generated subset, a 2 improvement over the 3 baseline. On the more complex Hand-Crafted subset, A2P attains 4 (a 5 gain over the 6 baseline) (West et al., 12 Sep 2025). This demonstrates robust operationalization of controllable versus uncontrollable failures in sequential decision settings.
Feature Attribution in Medical and Public Health Applications
CAFA was evaluated on Simulacrum lung cancer and UCI breast-cancer datasets (Kovvuri et al., 2022). In both cases, CAFA suppresses attributions to uncontrollable features such as age or sex (Pearson correlation between CAFA and SHAP on controllable features 7–8), and highlights controllable features (e.g., chemotherapy regimen, BMI). Applied to UK COVID-19 intervention data, CAFA attributed zero importance to uncontrollable indicators (cases, fatalities), and prioritized actionable intervention policies (e.g., public gathering bans, venue restrictions), consistent with established transmission risk factors.
5. Implications for Decision-Making and System Design
Partitioning failure attributions along controllability axes underpins actionable recommendations and root-cause analysis:
- In medicine, the separation identifies which aspects of a risk profile or treatment outcome are potentially modifiable, shaping clinical intervention strategies.
- In public health, the framework isolates policy levers (e.g., targeted restrictions) with maximal marginal effect, excluding immutable demographic or environmental variables from action planning.
- In multi-agent and conversational AI, rigorous counterfactual analysis determines whether error remediation is feasible at the step/action level or if system/environmental limitations preclude prevention.
A clear delineation channels operational focus toward interventions that are expected to yield improvements, and avoids misplaced emphasis on intractable sources of failure.
6. Limitations and Open Questions
Several constraints and unresolved issues pertain to controllable vs. uncontrollable failure attribution:
- Partitioning 9 generally relies on a priori domain knowledge and may be nontrivial, especially as the set of “controllable” variables can evolve with technological advances (e.g., gene therapy potentially rendering genetic factors controllable) (Kovvuri et al., 2022).
- The CAFA algorithm’s dependence on local perturbations and surrogate modeling may not extend straightforwardly to regression tasks or imbalance-heavy datasets. SHAP’s computational and statistical limitations (feature dependence, cost) are inherited by CAFA.
- In A2P, the fidelity of predictive counterfactual simulations is bounded by the capabilities of the LLM, and the threshold 0 for declaring controllability is heuristic (West et al., 12 Sep 2025).
- Single-step or localized interventions are assumed; genuinely uncontrollable failures or those requiring coordinated multi-step interventions may be underrepresented.
- Neither method quantifies the dominance of uncontrollable factors in the original model prediction—a possible direction for further risk decomposition (Kovvuri et al., 2022).
- External, unmodeled exogenous factors may render some failures undiagnosable via language or feature-based approaches, motivating integration with external monitoring or simulation (West et al., 12 Sep 2025).
7. Comparative Summary Table
| Aspect | CAFA Approach (Kovvuri et al., 2022) | A2P Scaffolding (West et al., 12 Sep 2025) |
|---|---|---|
| Domain | Predictive modeling, XAI, feature attribution | Multi-agent sequential decision, dialogue agents |
| Core notion of controllability | Partition of input features (controllable/uncontrollable) | Success probability under local action intervention |
| Method | Selective feature perturbation, SHAP-based attribution | Abduct–Act–Predict counterfactual routine |
| Output | Attribution vector zeroed for uncontrollable features | Step/action labeled as controllable/uncontrollable failure |
| Illustrative application | Medicine, public health (cancer, COVID-19 policy) | Root-cause error localization in dialogue systems |
Both CAFA and A2P provide rigorous methodological frameworks for distinguishing controllable from uncontrollable attributions, enhancing transparency and verifiability in automated decision-support and failure analysis across scientific and engineering domains.