Utility-Driven Self-Healing
- Utility-driven self-healing is a paradigm that defines explicit utility functions to drive autonomous restoration in both software and material systems.
- The approach integrates pattern-based rule adaptations with reinforcement learning in Markov Decision Processes to balance service quality, resource costs, and fault recovery.
- Empirical evaluations show scalable, real-time planning and optimal reward accumulation, outperforming static rule-based methods under dynamic failure conditions.
Utility-driven self-healing is a paradigm for the autonomous restoration of system functionality that frames adaptation and recovery as an optimization of explicitly defined utility functions. In contrast to purely rule-based or heuristic approaches, utility-driven self-healing leverages quantitative trade-offs between service quality, resource use, and architectural or material constraints. This paradigm is realized in both software architectures and material systems, unifying methodologies from pattern-based runtime adaptation and reinforcement learning within Markov Decision Processes (MDPs). Key contributions span both incrementally optimal, scalable mechanisms for fault recovery in large dynamic architectures and adaptive, fine-grained actuation in self-healing materials.
1. Foundational Utility Function Formalisms
At the core of utility-driven self-healing lies a compositional utility function. For graph-based runtime system models , the overall utility is defined as a sum over matches of domain-specific patterns within the system graph:
where denotes the set of matches for pattern in , and is a real-valued sub-function encoding the utility, positive or negative, of that match. Positive patterns () encapsulate beneficial architectural fragments or healthy states, whereas negative patterns () identify faults, failures, or undesired structural configurations, each incurring utility loss. Adaptation acts to incrementally remove negative pattern matches and establish positive ones, with utility updates performed locally for efficient runtime management (Ghahremani et al., 2018, Ghahremani et al., 2020).
2. Rule-Based and Pattern-Driven Adaptation in Software Architectures
In large-scale, dynamically reconfigurable architectures, adaptation decisions are cast as applications of local graph transformation rules 0, where 1 extends a negative pattern 2. Each rule application 3 is annotated not only with its local utility impact:
4
but also with its estimated execution cost 5. The planning logic proceeds incrementally: for every newly detected issue (negative pattern match), the applicable rules are ranked by their expected utility improvement per unit cost (6), and selected for execution in descending ratio order. This ensures maximal reward (area under 7), both per issue and in global progression, without a computationally expensive combinatorial optimization step (Ghahremani et al., 2018, Ghahremani et al., 2020).
Pattern-based matching and incremental annotation across cycles reduces the adaptation and planning complexity to 8 per cycle, supporting real-time operation in architectures of up to 9 components and 0 simultaneous faults. Global optimality of utility and reward is preserved under the assumptions that rule effects are strictly local and patterns are non-overlapping.
3. Utility Optimization via Reinforcement Learning in Self-Healing Materials
In the context of autonomous self-healing material systems, utility-driven control is formulated as a Markov Decision Process (MDP) with explicit modeling of structural health, damage indicators, and finite healing-agent supply:
- State 1, capturing integrity 2, damage metrics 3, healing agent status 4.
- Action space 5: discrete options (e.g., "chemical-high", "thermal-med", "none") or continuous normalized dosages (6).
- Transition dynamics combine stochastic damage (via local stress Laplacian) and healing efficacy (Bernoulli success, Beta-distributed effect fractions).
- Utility (reward) function expresses the weighted difference in structural integrity opposed by resource consumption:
7
8
This allows for the application of RL algorithms, including Q-learning, DQN, and TD3, to autonomously discover optimal self-healing policies that balance longevity and resource constraints.
| RL Algorithm | Action Space | Final Integrity 9 | Avg Reward | Convergence Steps |
|---|---|---|---|---|
| Q-Learning | Discrete | 0 | 1 | 2–3 |
| DQN | Discrete | 4 | 5 | 6–7 |
| TD3 | Continuous | 8 | 9 | 0–1 |
| Heuristic | Discrete | 2 | 3 | N/A |
| Random | Discrete | 4 | 5 | N/A |
Continuous-action policies enable proportional, fine-grained dosing—minimizing overshoot, avoiding step oscillation, and achieving the highest integrity and reward under resource budgets (Chatterjee et al., 24 Nov 2025).
4. Incremental, Scalable Planning and Execution
The utility-driven approach avoids full global re-optimization via three central mechanisms:
- Pattern localization: Each rule's side-effect and impact are tightly scoped, supported by assumptions that exclude multi-pattern or probabilistic interactions. This enables rapid incremental updates after localized changes.
- Per-issue optimization: For each fault, among all applicable adaptation rules, the one maximizing 6 (and then minimizing cost) is selected.
- Ordering by efficiency: When multiple repairs are scheduled, they are prioritized by descending 7, maximizing immediate and accumulated reward, independent of the global state size.
Empirical evaluations against constraint-solver-based planning and static rule assignments show that the utility-driven approach consistently achieves both optimality and real-time scalability. Planning time remains within 8–9 of static baselines, and is 0–1 faster than solver-based methods at large scales. Across diverse failure profiles and real-world traces, reward performance is strictly higher or equivalent compared to alternatives, except when global solvers are too slow to respond to rapid failure bursts (Ghahremani et al., 2020, Ghahremani et al., 2018).
5. Performance, Robustness, and Comparative Insights
Experimental results in both architectural and material system domains confirm the advantages of utility-driven self-healing:
- Speed: RL-based controllers for materials (TD3) converge to maximal integrity within 2–3 steps, outperforming both heuristic and discrete controllers. Pattern-based adaptation in architectures exhibits linear scaling in planning time, even for 4 failure bursts.
- Reward Accumulation: Highest area under 5 (cumulative utility) is achieved, reliably outpacing static or delayed solver strategies.
- Optimality: Under structural assumptions, local per-issue optimization and orderings guarantee global utility optimality post-repair and optimal reward accumulation over time.
- Stability: Near-zero variance in final integrity for continuous-action material controllers and stable utility recovery trajectories in large architectures, even with bursty and heterogeneous failure traces.
Solver-based approaches, while theoretically globally optimal, display reward deficits when planning times exceed event arrival rates. Static approaches yield sub-optimal recovery and up to 6 reward loss due to non-adaptive rule selection and ordering (Ghahremani et al., 2020, Ghahremani et al., 2018, Chatterjee et al., 24 Nov 2025).
6. Limitations, Assumptions, and Research Directions
Current utility-driven self-healing approaches assume determinism, strictly local rule effects (removal or resolution of single negative patterns), and constant-bounded pattern size. Scenarios with probabilistic rule efficacy, overlapping or cascading pattern interactions, and richer concurrency models remain open challenges. Material RL controllers operate on reduced-order surrogate models rather than high-fidelity multi-physics FEMs, motivating work on integrating simulation pretraining and transfer to physical environments (Chatterjee et al., 24 Nov 2025).
Ongoing research directions include:
- Learning utility sub-functions at runtime via machine learning.
- Generalization to probabilistic, side-effecting, or concurrent adaptation rules.
- Application to broader classes of cyber-physical, biological, and engineered systems.
- Integration with high-fidelity material simulators and online data for adaptive self-healing in real-world composites.
7. Cross-Domain Synthesis and Applications
Utility-driven self-healing provides a unifying formalism for adaptive, optimal recovery in both software and material domains. The translation of pattern-based adaptation, utility quantification, and incremental impact computation across these domains enables robust, context-aware sustainability in large-scale, fault-prone systems. The paradigm supports lightweight, embedded deployment for real-time control, as well as scalable management of software architectures at cloud and IoT scales.
Papers by Kehrer et al. (Ghahremani et al., 2018, Ghahremani et al., 2020) and Chatterjee et al. (Chatterjee et al., 24 Nov 2025) collectively demonstrate that by formalizing self-healing as a dynamic, utility-maximizing process with localized, impact-aware adaptation, it is possible to seamlessly combine optimality, scalability, and adaptability, laying the groundwork for future autonomous material and software systems.