Sustainability-Aware Evaluation Framework
- Sustainability-aware evaluation frameworks are integrated models that quantify multi-dimensional impacts, including environmental, social, economic, and technical factors.
- They employ advanced non-compensatory metrics and MCDA techniques to ensure balanced performance without masking deficits in any single domain.
- These frameworks have been applied in healthcare, federated learning, AI modeling, and IoT processes to enable responsible, data-driven decision-making.
A sustainability-aware evaluation framework is an integrated set of architecture, metrics, and operational procedures designed to rigorously assess, compare, and optimize technical systems, algorithms, business processes, or organizational workflows by quantifying sustainability performance across environmental, social, economic, and technical dimensions. These frameworks go beyond conventional accuracy- or performance-only benchmarks by introducing multidimensional metrics, non-compensatory scoring models, and explicit trade-off analysis, thus facilitating responsible decision-making in high-impact domains, including healthcare, AI systems, federated learning, code generation, business process management, and software engineering.
1. Foundations and Conceptual Architecture
Sustainability-aware evaluation frameworks are grounded in the recognition that performance maximization alone is insufficient when adverse impacts can propagate across ecological, societal, and financial domains. Early frameworks in healthcare (SSP-AHP) (Wątróbski et al., 2023) established hierarchical architectures organized into domains, sub-domains, and indicators. Domain granularity enables transparent mapping of objectives (e.g., equity, quality, responsiveness, financial coverage, adaptability), each further decomposed into clusters of measurable indicators (e.g., cataract surgeries per capita, AMI mortality, life expectancy).
A key innovation is the adoption of non-compensatory paradigms—such as the “strong sustainability” principle—which preclude unlimited trade-off between criteria. For example, the SSP-AHP introduces a sustainability coefficient and penalizes deviation from balanced performance, operationalizing the “no-substitution” rule: systems with single-domain excellence are penalized if they underperform elsewhere.
The architecture of such frameworks typically includes:
- Identification and hierarchical organization of sustainability targets.
- Selection of standardized, comparable indicators.
- Integration of multi-criteria decision analysis (MCDA), often via AHP or its extensions to fuzzy sets or non-compensatory logic.
- Data normalization and benchmark procedures.
- Aggregation and ranking algorithms informed by sectoral priorities.
2. Metric Design and Mathematical Formalization
Sustainability-aware frameworks are characterized by explicit formalization of metrics tailored to both impact quantification and trade-off analysis. Mathematical definitions are central to these designs, enabling reproducibility and cross-domain comparability.
Example: SSP-AHP (Wątróbski et al., 2023)
Scores are computed via a weighted aggregation after compensation reduction:
where is the normalized indicator, the criterion weight, and the sustainability coefficient.
Federated Learning: GreenDFL (Feng et al., 27 Feb 2025)
For decentralized learning, sustainability metrics are computed across nodes:
$C = \sum_{k=1}^K CI_k \cdot E_k$
where energy and carbon are decomposed by phase (training, communication, aggregation), hardware profiling, network topology, and regional grid intensities.
Algorithmic Sustainability: FMS/ASC (Li et al., 24 Aug 2025)
A two-dimensional approach combines an energy-efficiency decay transform with performance scores:
where is normalized performance, is exponentially penalized energy, and ASC integrates performance over normalized energy consumption checkpoints.
AI Model Assessment: RAISE (Nguyen et al., 21 Oct 2025)
Raw metrics for explainability, fairness, robustness, and sustainability are normalized:
with sustainability specifically measured via parameters, FLOPs, MACs, and estimated CO emissions.
3. Algorithms, Scoring, and Aggregation Methods
A hallmark of sustainability-aware evaluation is the formulation of robust, non-linear, and multi-objective aggregation algorithms. These may employ:
- Analytic Hierarchy Process (AHP) or its strong sustainability/fuzzy variants.
- Non-compensatory logic (e.g., adjustable penalty coefficients).
- Pareto frontier construction for multi-objective nondominance (Liu et al., 18 Oct 2025).
- Scalarization and trend-aware regression for trade-off detection (e.g., BRACE’s CIRC and OTER for code generation (Mehditabar et al., 10 Nov 2025)).
In multi-agent and agentic AI settings, sustainability scores are computed via real-time metrics logging, MCDA-weighted aggregation, and normalization: for energy, carbon, and water, where (Gosmar et al., 10 Nov 2025).
Trade-off management is further informed by scenario-dependent weighting (e.g., environmental vs social priorities) and sensitivity analysis to ensure robust ranking under changing stakeholder emphases.
4. Application Areas and Empirical Evidence
Sustainability-aware frameworks have seen diverse applications and validation across multiple domains:
- Healthcare Systems: SSP-AHP benchmarking finds Nordic countries deliver balanced sustainability across social and financial dimensions, while Central/Eastern Europe is penalized under strong sustainability logic (Wątróbski et al., 2023).
- Federated Learning: GreenDFL demonstrates that local training energy dominates total impact (>99%), with novel algorithms enabling aggregation/node selection strategies that reduce carbon emissions by up to 30% (Feng et al., 27 Feb 2025).
- Code and AI Models: BRACE establishes interpretable 1–5 rating scales (CIRC/OTER) for joint functional and energy efficiency across LLMs, finding that parameter count does not reliably predict “sustainable” performance (Mehditabar et al., 10 Nov 2025).
- Supply Chain Document Intelligence: Agentic AI workflows dramatically reduce energy (up to 90%), carbon (97%), and water usage (98%) relative to manual processes, with multi-agent validation workflows yielding consistent savings and governance integration (Gosmar et al., 10 Nov 2025).
- IoT-enhanced Business Processes: Methodologies incorporate sensor data, customized metrics (MCFI, CFID), and BPMN/IoT device integration to analyze and improve operational sustainability, validated in tourism and healthcare case studies (Bosch et al., 7 Aug 2025).
- Software Engineering Requirements: SEER leverages LLM-driven taxonomy mapping and requirement optimization for early-phase sustainability assurance. Coverage rates of 60–90% versus expert-identified SRs indicate robust, scalable pipeline efficacy (Roy et al., 10 Oct 2025).
5. Comparative Analysis and Best Practices
Frameworks exhibit design diversity but recurring methodological best practices:
- Use non-compensatory scoring to avoid “single-pillar excellence masking overall unsustainability” (Wątróbski et al., 2023).
- Prioritize instrumented data collection: energy, emissions, water, throughput, and hardware attributes.
- Benchmark with reproducible, open-source protocols and enable sensitivity analysis across trade-off parameters (Liu et al., 18 Oct 2025).
- Integrate scenario normalization and stakeholder-weighted MCDA aggregation for ESG governance and reporting (Gosmar et al., 10 Nov 2025).
- Employ dual objective rating systems (accuracy/energy), combining static and trend-aware methods to suit deployment context (Mehditabar et al., 10 Nov 2025).
- Automate recommendation and assignment of indicators via ML/NLP for report analysis and regulatory mapping (Hillebrand et al., 2023).
- Document cross-domain conflict mapping and resolution for holistic sustainability analysis, with transparent calculation of risk ratios (Chakrabarti, 15 Dec 2025, Chakrabarti, 12 Dec 2025).
6. Limitations, Challenges, and Prospects
Although sustainability-aware frameworks provide significant operational and governance advancements, several limitations persist:
- Metric calibration is hardware- and domain-dependent, requiring periodic tuning for accurate cross-sectoral comparison.
- Real-time adaptation (e.g., via AI-assisted weighting updates, dynamic scenario feedback) is only partially implemented and remains a research frontier (Farahdel et al., 2024).
- Data access and instrumentation challenges—especially for supply chain or embedded systems—may limit coverage or fidelity.
- Methodological convergence (statistical vs ML/NLP vs process-oriented frameworks) poses integration complexity for large-scale or multi-organizational adoption.
Future methodological extensions include hybrid MCDA approaches, integration with life-cycle assessments, reinforcement learning–based metric adaptation, and expansion to complex supply chains and edge computing scenarios under real-time carbon-aware scheduling (Farahdel et al., 2024, Nguyen et al., 21 Oct 2025, Paramanayakam et al., 29 Apr 2025).
7. Cross-Domain Synthesis and Implementation Guidance
Taken together, the surveyed frameworks demonstrate that sustainability-aware evaluation is most effective when:
- Built on clear domain-specific hierarchies of goals, sub-goals, and indicators, with robust normalization and reference benchmarking.
- Instrumented to capture and aggregate real technical, social, and environmental trade-offs, using multicriteria logic that precludes masking deficits in key dimensions.
- Operationalized within adaptable software platforms for real-time analysis and reporting, with governance-ready outputs.
- Designed to empower evidence-based selection, responsible deployment, and continuous improvement toward established sustainability objectives.
For practitioners, following standardized, documented steps for domain analysis, metric selection, weighting, aggregation, and feedback iteration ensures that sustainability becomes an actionable and managed property, not merely an aspirational target. This approach aligns technical evaluation with global sustainability standards and regulatory compliance, enabling scalable, transparent, and accountable decision-making in high-impact domains.