ML-Based Arch-Level Power Models

Updated 14 December 2025

ML-based architecture-level power models are data-driven frameworks that predict power consumption by leveraging high-level structural metrics and dynamic event statistics.
They employ hybrid formulations that combine analytical methods with ML corrections to factorize hardware-dependent effects and workload-specific variations for improved accuracy.
Recent advancements such as few-shot learning and cross-architecture transfer enable precise early-stage power estimation, supporting design space exploration and energy-aware optimization.

ML-based architecture-level power models are data-driven predictive frameworks designed to estimate processor, CPU, GPU, or FPGA power consumption using high-level structural and event features, often in combination with limited ground-truth measurements. These models address the shortcomings of traditional analytical approaches (e.g., McPAT, Wattch), which fail to capture real implementation variation and workload dependence, by utilizing machine learning algorithms trained on representative microarchitectural and runtime datasets. Recent advances focus on cross-architecture generalization, few-shot learning, and integration with design space exploration, enabling accurate early-stage power estimation with minimal labeled data, especially in realistic scenarios where comprehensive datasets are unavailable.

1. Component and Feature Partitioning Strategies

ML-based power modeling at the architecture level begins with fine-grained partitioning of the design into physically and behaviorally distinct blocks. FirePower (Zhang et al., 2024) decomposes an out-of-order CPU core into 21 “power-friendly” components spanning frontend (e.g., BPTAGE, IFU, ICache), execution (ROB, ISU, FU Pool, Regfile), memory access (LSU, DCache, TLB), and control/routing logic. ArchPower (Zhang et al., 7 Dec 2025) similarly divides BOOM and XiangShan RISC-V designs into 11 logical units (e.g., BP, IFU, ICache, RNU), with each component’s power further stratified into combinational, sequential, SRAM, and clock sub-groups.

Feature sets for ML models aggregate:

Hardware (static) parameters: pipeline widths, queue depths, issue/FU widths, cache associativity, register counts, TLB entries, fetch buffer sizes.
Event (dynamic) statistics: performance simulator outputs (gem5, Vivado HLS, NVML counters), including cache misses, branch mispredictions, instruction types/mix, register-file accesses, and cycle rates.

FPGA-focused frameworks (HLSDataset (Wei et al., 2023), HL-Pow (Lin et al., 2020)) incorporate resource estimates (#LUT, #FF, #BRAM, #DSP, clock period), HLS operator stats, and toggle-counts from IR-level simulations, with features extracted from HLS reports and IR activity-tracking pipelines.

2. Model Formulations: Hybridization and Factorization

Modern ML-based models advance beyond monolithic regressors by using factorized component-level representations that distinguish generalizable hardware-dependent effects from architecture-specific workload variance. FirePower (Zhang et al., 2024) expresses per-component instantaneous power as:

$P_i = F^{\,i}_{\text{hw}}(H_i;\theta^{\,i}_{\text{hw}}) \times F^{\,i}_{\text{event}}(H_i, E_i;\theta^{\,i}_{\text{evt}})$

where $F^{\,i}_{\text{hw}}$ models cross-architecture (hardware-scale) knowledge and $F^{\,i}_{\text{event}}$ captures workload-dependent event-driven variation. PANDA (Zhang et al., 2023) unifies analytical and ML paradigms at the component level:

$P_{\mathrm{PANDA}}^i = F_{\mathrm{ml}}^i(C_i, E_i) \times F_{\mathrm{res}}^i(C_i)$

with $F_{\mathrm{res}}^i$ as the resource function (e.g., cache scaling law) and $F_{\mathrm{ml}}^i$ applying a data-driven correction. This strictly subsumes both purely analytical and purely ML models.

SRAM/clock/logic “power group” decoupling in AutoPower (Zhang et al., 17 Aug 2025) further decomposes total power into sub-models linked to structural metrics, e.g., clock power partitioned into gated/ungated/cell components, each predicted via ML regressors given minimal (few-shot) training data.

3. Training Methodologies, Feature Selection, and Dataset Construction

Training these models requires realistic datasets linking architectural features to ground-truth power measurements. ArchPower (Zhang et al., 7 Dec 2025) and FirePower (Zhang et al., 2024) utilize parametrized RTL implementations of out-of-order RISC-V cores, synthesized to gate-level netlists and simulated with commercial EDA tools (Design Compiler, PrimePower). Event traces are parallelized via gem5 on matched configurations, resulting in a comprehensive feature-label dataset (101 features, 200 samples in ArchPower).

Feature selection is typically automated. Random Forest-based importance metrics reliably select a minimal counter set for power prediction across architectures (ARM, Intel; (Chen et al., 2020, Chen et al., 2017)). XGBoost hyperparameters are generally fixed (e.g., max_depth=6, n_estimators≈100), with regularization applied to leaf weights or regression coefficients. For HLS flows, HLSDataset (Wei et al., 2023) and HL-Pow (Lin et al., 2020) include thousands of design points spanning broad pragma/parameter spaces, annotated with resource utilization and simulated power.

4. Few-Shot and Cross-Architecture Transfer Mechanisms

A central research objective is robust generalization with minimal data from the target architecture. FirePower (Zhang et al., 2024) implements a retrain/reuse policy per component: if one hardware parameter dominates (feature importance ≥0.95), the hardware model is retrained with a 1-D linear fit on the target, otherwise transferred directly from known architecture. The event model adapts scale mismatches via XGBoost with labels normalized to the hardware baseline. AutoPower (Zhang et al., 17 Aug 2025) fits block geometries and clock hardware parameters using only two known designs per architecture, then transfers geometrical mapping and trains event-sensitive regressors across similar workloads.

PANDA (Zhang et al., 2023) demonstrates unification of analytic and component-wise ML corrections, supporting transfer not only across device configurations, but also across technology nodes (with resource scaling), achieving lower MAPE in cross-technology prediction scenarios.

5. Empirical Evaluation and Accuracy Benchmarks

Model performance is quantified using mean absolute percentage error (MAPE), Pearson correlation coefficient (R), and/or coefficient of determination ( $R^2$ ). Representative results:

Model	Training Data	MAPE (%)	Correlation (R/ $R^2$ )	Platform
FirePower	2 configs	5.8	0.98	BOOM ↔ XiangShan
AutoPower	2 configs	4.36	0.96	BOOM
PANDA	10 configs	8.0	0.99	BOOM
ArchPower McPAT-Calib	2 configs	9.29	0.87	BOOM
HLSDataset GNN	9k designs	3.89–9.43	—	FPGA ZU9EG/XC7V585T
HL-Pow CNN/GBDT	11k designs	4.67–4.78	—	FPGA ZCU102
CNN-inference RF	200–300 pts	5.03	0.9561	NVIDIA V100S

Across platforms, component-level and power-group-decoupled models using XGBoost, GNNs, or hybrid approaches consistently outperform classical analytic and monolithic black-box ML models, especially under few-shot regimes.

6. Application Domains and Model Extensions

ML-based architecture-level power models enable a variety of practical workflows:

Design space exploration (DSE): HL-Pow (Lin et al., 2020) and HLSDataset (Wei et al., 2023) embed power prediction in DSE heuristics, rapidly estimating the Pareto frontier for latency/power using analytic sampling strategies (e.g., providing >2× reduction in evaluation budget).
Power-aware optimization: PANDA (Zhang et al., 2023) guides performance optimization given power constraints, selecting configurations that maximize throughput within energy caps.
Cross-technology prediction: PANDA supports extrapolation across process nodes via analytic scaling and ML adaptation.
Heterogeneous and multicore platforms: Recent work (MuMMI (Wu et al., 2020), CrossArchitectural (Chen et al., 2020)) generalizes methodology for cloud-edge data centers and HPC systems, using selected hardware counters and composite models (linear + SVR) to deliver sub-10% error on ARM, Intel, Blue Gene/Q.

Extensions contemplated include online adaptation to process/voltage/temperature variation, support for accelerators (e.g., CNN/MAC arrays), and symbolic regression for automating resource function discovery.

7. Limitations, Challenges, and Future Directions

Despite substantial progress, some limitations persist:

Dataset breadth: Most open-source datasets (ArchPower (Zhang et al., 7 Dec 2025), HLSDataset (Wei et al., 2023)) remain limited in microarchitecture and workload diversity.
Event coverage and accuracy: Dynamic features extracted from simulators depend on representative workloads. Unseen data patterns may degrade power prediction.
Device-specific retraining: Most frameworks (especially in FPGAs/HLS) require retraining per technology/family due to variations in fabric, voltage rails, or SRAM mapping logic.
Transfer to radically different architectures: Models built on BOOM-style out-of-order cores or specific GPU families may require new resource functions or retraining for new design types (in-order cores, vector extensions, VLIW).
Automated feature learning: There is a marked shift towards unsupervised or symbolic regression approaches for feature extraction and resource function identification.
Cross-PVT/corner adaptation: Proposals exist to bring meta-learning or Bayesian variants to bear for robust estimation under process, voltage, and temperature variation.

A plausible implication is that continued growth in dataset availability and cross-disciplinary collaboration (hardware/ML) will incite further generality, pushing models closer to high-fidelity, "one-shot" prediction with minimal designer intervention.

References

FirePower: Towards a Foundation with Generalizable Knowledge for Architecture-Level Power Modeling (Zhang et al., 2024)
PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions (Zhang et al., 2023)
ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU Design (Zhang et al., 7 Dec 2025)
HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis (Wei et al., 2023)
HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis (Lin et al., 2020)
AutoPower: Automated Few-Shot Architecture-Level Power Modeling by Power Group Decoupling (Zhang et al., 17 Aug 2025)
Cross Architectural Power Modelling (Chen et al., 2020)
Power Modelling for Heterogeneous Cloud-Edge Data Centers (Chen et al., 2017)
Performance and Power Modeling and Prediction Using MuMMI and Ten Machine Learning Methods (Wu et al., 2020)
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems (Metz, 2023)