NeuralPower Framework

Updated 4 March 2026

NeuralPower Frameworks are neural network-based systems that predict CNN energy, runtime, and power to guide design decisions in both deep learning and IC PDN synthesis.
They leverage layer-wise sparse polynomial regression with Lasso regularization to generate accurate, interpretable models of CNN components and deliver energy-accuracy trade-offs using the Energy-Precision Ratio.
The framework also integrates CNN-driven PDN synthesis to assign optimal grid templates, ensuring compliance with IR drop and electromigration constraints while reducing routing resource usage.

NeuralPower Frameworks represent a class of neural network-driven predictive and optimization systems for the estimation and design of power, runtime, and energy in convolutional neural networks (CNNs) and for power delivery network (PDN) synthesis. Originating from two primary research threads, one for system-level energy profiling and architectural trade-off analysis in deep learning (Cai et al., 2017), and another for PDN grid synthesis in integrated circuit (IC) design (Chhabria et al., 2021), NeuralPower frameworks provide model- and region-specific predictions that guide design choices for energy-accuracy trade-offs and resource allocation while ensuring physical and operational constraints are met.

1. NeuralPower for CNN Energy Profiling and Prediction

NeuralPower, as defined in (Cai et al., 2017), is a predictive framework based on layer-wise sparse polynomial regression, specifically developed to estimate the serving energy consumption, power, and runtime for CNN inference on GPU platforms before model training. For each CNN layer $l$ (convolutional, pooling, or fully connected), two distinct regression models are fitted: one for inference runtime $r_l$ and one for average power $p_l$ . The models are constructed as follows:

Runtime Model:

$r_l = \sum_{j=1}^{J_r} \alpha_{j}^{(l)} \prod_{i=1}^{D} \left(f_i^{(l)}\right)^{q_{ij}} + \sum_{s=1}^{S_r} \beta_{s}^{(l)} \mathcal{F}_s^{(l)} + \epsilon_r^{(l)},$

where $\{f_i^{(l)}\}$ are "raw" features (batch size, tensor dimensions, kernel size, stride, etc.), and $\{\mathcal{F}_s^{(l)}\}$ are "special" features (total FLOPs, memory access counts).

Power Model:

$p_l = \sum_{j=1}^{J_p} \gamma_{j}^{(l)} \prod_{i=1}^{D'} \left(\tilde{f}_i^{(l)}\right)^{m_{ij}} + \sum_{t=1}^{S_p} \delta_t^{(l)} \tilde{\mathcal{F}}_t^{(l)} + \epsilon_p^{(l)},$

where $\tilde{f}_i^{(l)}$ includes both the original features and their logarithms (to model power saturation effects).

Model sparsity and interpretability are achieved through Lasso ( $\ell_1$ ) regularization, selecting a reduced subset of nonzero terms (typically 20–75 features per model). The regression degree is cross-validated ( $K_r$ up to 3 for conv runtime; $K_p=2$ for power/FC).

Aggregating predictions across $L$ layers produces network-level metrics:

$\hat T_{\rm total} = \sum_{l=1}^L \hat r_l, \quad \hat P_{\rm avg} = \frac{\sum_{l=1}^L \hat p_l \hat r_l}{\sum_{l=1}^L \hat r_l}, \quad \hat E_{\rm CNN} = \sum_{l=1}^L \hat p_l \hat r_l$

2. The Energy-Precision Ratio Metric

To enable principled energy-accuracy trade-off analysis when performing architecture search or hyperparameter optimization, NeuralPower introduces the Energy-Precision Ratio (EPR, also denoted as $M_\alpha$ ):

$M_{\alpha} = (\mathrm{Error})^{\alpha} \times \mathrm{EPI}, \qquad \mathrm{EPI} = \frac{\hat E_{\rm CNN}}{N_{\mathrm{inferred}}},$

where "Error" typically refers to Top-1 or Top-5 classification error and $\alpha > 0$ is tunable. Lower $M_\alpha$ corresponds to more favorable accuracy versus energy profiles.

3. CNN Power Delivery Network Synthesis with NeuralPower

In the context of IC design, NeuralPower [*Editor's term] refers to a CNN-based framework for synthesizing PDN grids that satisfy static IR drop and electromigration (EM) constraints while minimizing routing resource consumption (Chhabria et al., 2021). The framework partitions grid synthesis into two stages—floorplanning and placement—employing separate CNNs:

Floorplan Stage ("FP-CNN"): Consumes block-level current, congestion, macro/blockage, and C4 bump maps to assign one of $|T|=8$ pruned, nondominated PDN templates to each region.
Placement Stage ("PL-CNN"): Refines region templates using fine-grained, cell-level current and congestion distributions plus prior assignment; applies small perturbations to balance IR/EM slack and routing demand.

Each region is treated as a tensor assembling up to five multi-channel heatmaps (current, congestion, macro mask, C4 distance, template ID), input into modified LeNet-style CNNs with $\approx$ 90M MACs and $\approx$ 18M parameters per inference.

4. PDN Template Definition and Selection

PDN templates $T_i$ are parameterized by stripe pitch $p_\ell$ and width $w_\ell$ on each metal layer $\ell$ . For example, in 65nm LP, templates vary the density on M4, M7, M8 (yielding 27 raw candidates) but are Pareto-pruned by equivalent resistance $R_i$ and utilization $U_i$ :

$T_j$ dominates $T_i$ if $R_j \leq R_i$ and $U_j \leq U_i$ .
$|T|=8$ nondominated templates balance resistance and routing cost.

Each region's assignment to a template ensures maximum $\Delta V_r \leq V_{th}$ and $j_r \leq j_{\rm EM, max}$ . All templates are precharacterized for legal operation under worst-case conditions, such that online grid tiling never violates IR or EM due to template selection alone.

5. Training, Transfer Learning, and Evaluation

NeuralPower for PDN design uses a two-stage training protocol:

Synthetic Dataset Generation: Gaussian field-based random current maps, routability maps, macro/blockage/C4 layouts, and template labeling via a simulated annealing optimizer, yielding 9,000–12,250 training samples per tech node.
CNN Training: Cross-entropy loss, Adam optimizer, dropout 0.3; synthetic test accuracy $\approx97\%$ .
Transfer Learning: CNN convolutional and pooling layers are frozen post-synthetic training; only fully connected layers are reinitialized and trained on limited real-circuit data (e.g., 116 to 241 labeled regions), achieving 90–95% accuracy on real designs.

On OpenROAD testcases (40k–500k cells, up to 225 regions), NeuralPower achieves:

0.9–2.7% reduced track usage in high-congestion regions (≈1,300 tracks saved)
Uniform IR drop within 9–11.8 mV on a 12 mV budget
EM compliance everywhere ( $j_{\mathrm{norm}} \leq 1$ )
Statistical parity in PDN quality with simulated annealing, yet $20\times$ – $800\times$ lower run time

6. Practical Guidance and Best Practices

For the original NeuralPower system (Cai et al., 2017), model training on new GPU/framework combinations entails collecting $\sim$ 1,000 convolutional, 200 pooling, and 100 fully-connected samples (including power and runtime), fitting sparse polynomial models via Lasso in $\approx$ 30 minutes. Once trained, models enable immediate power/runtime predictions for arbitrary networks, eliminating the need for physical compilation or network execution during design iterations.

Best practices for accuracy and interpretability include:

Inclusion of both raw and $\log$ features for power models
Degree-3 polynomials for convolutional runtime and degree-2 for power/fully connected layers
Incorporation of FLOPs and memory accesses as special features to capture root bottlenecks
Always cross-validate polynomial degree and Lasso regularization strength for optimal sparsity

For the PDN synthesis case (Chhabria et al., 2021), practitioners are advised to:

Retrain per technology node or bump-pitch specification
Consider current limitations: static IR only, fixed region size, absence of dynamic IR/drop or advanced 3D-IC/ESD modeling
Extend architecture to integrate with timing-driven flows and online re-training in response to ECO flows as next research steps

7. Limitations and Future Directions

Current NeuralPower frameworks assume static conditions (e.g., steady-state IR drop) and require retraining for technology migration or design style changes. In PDN synthesis, the model does not yet accommodate dynamic IR, leakage, temperature effects, multi-voltage domains, decap modeling, or 3D-IC TSV structures. Future directions include dynamic power and noise modeling, tighter integration with placement/routing, and enabling large-scale transfer learning for diverse process-voltage-temperature (PVT) spaces (Cai et al., 2017, Chhabria et al., 2021).

Markdown Report Issue Upgrade to Chat

References (2)

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks (2017)

OpeNPDN: A Neural-network-based Framework for Power Delivery Network Synthesis (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NeuralPower Framework.

NeuralPower Framework

1. NeuralPower for CNN Energy Profiling and Prediction

2. The Energy-Precision Ratio Metric

3. CNN Power Delivery Network Synthesis with NeuralPower

4. PDN Template Definition and Selection

5. Training, Transfer Learning, and Evaluation

6. Practical Guidance and Best Practices

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

NeuralPower Framework

1. NeuralPower for CNN Energy Profiling and Prediction

2. The Energy-Precision Ratio Metric

3. CNN Power Delivery Network Synthesis with NeuralPower

4. PDN Template Definition and Selection

5. Training, Transfer Learning, and Evaluation

6. Practical Guidance and Best Practices

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research