Papers
Topics
Authors
Recent
Search
2000 character limit reached

NeuralPower Framework

Updated 4 March 2026
  • NeuralPower Frameworks are neural network-based systems that predict CNN energy, runtime, and power to guide design decisions in both deep learning and IC PDN synthesis.
  • They leverage layer-wise sparse polynomial regression with Lasso regularization to generate accurate, interpretable models of CNN components and deliver energy-accuracy trade-offs using the Energy-Precision Ratio.
  • The framework also integrates CNN-driven PDN synthesis to assign optimal grid templates, ensuring compliance with IR drop and electromigration constraints while reducing routing resource usage.

NeuralPower Frameworks represent a class of neural network-driven predictive and optimization systems for the estimation and design of power, runtime, and energy in convolutional neural networks (CNNs) and for power delivery network (PDN) synthesis. Originating from two primary research threads, one for system-level energy profiling and architectural trade-off analysis in deep learning (Cai et al., 2017), and another for PDN grid synthesis in integrated circuit (IC) design (Chhabria et al., 2021), NeuralPower frameworks provide model- and region-specific predictions that guide design choices for energy-accuracy trade-offs and resource allocation while ensuring physical and operational constraints are met.

1. NeuralPower for CNN Energy Profiling and Prediction

NeuralPower, as defined in (Cai et al., 2017), is a predictive framework based on layer-wise sparse polynomial regression, specifically developed to estimate the serving energy consumption, power, and runtime for CNN inference on GPU platforms before model training. For each CNN layer ll (convolutional, pooling, or fully connected), two distinct regression models are fitted: one for inference runtime rlr_l and one for average power plp_l. The models are constructed as follows:

  • Runtime Model:

rl=j=1Jrαj(l)i=1D(fi(l))qij+s=1Srβs(l)Fs(l)+ϵr(l),r_l = \sum_{j=1}^{J_r} \alpha_{j}^{(l)} \prod_{i=1}^{D} \left(f_i^{(l)}\right)^{q_{ij}} + \sum_{s=1}^{S_r} \beta_{s}^{(l)} \mathcal{F}_s^{(l)} + \epsilon_r^{(l)},

where {fi(l)}\{f_i^{(l)}\} are "raw" features (batch size, tensor dimensions, kernel size, stride, etc.), and {Fs(l)}\{\mathcal{F}_s^{(l)}\} are "special" features (total FLOPs, memory access counts).

  • Power Model:

pl=j=1Jpγj(l)i=1D(f~i(l))mij+t=1Spδt(l)F~t(l)+ϵp(l),p_l = \sum_{j=1}^{J_p} \gamma_{j}^{(l)} \prod_{i=1}^{D'} \left(\tilde{f}_i^{(l)}\right)^{m_{ij}} + \sum_{t=1}^{S_p} \delta_t^{(l)} \tilde{\mathcal{F}}_t^{(l)} + \epsilon_p^{(l)},

where f~i(l)\tilde{f}_i^{(l)} includes both the original features and their logarithms (to model power saturation effects).

Model sparsity and interpretability are achieved through Lasso (1\ell_1) regularization, selecting a reduced subset of nonzero terms (typically 20–75 features per model). The regression degree is cross-validated (KrK_r up to 3 for conv runtime; Kp=2K_p=2 for power/FC).

Aggregating predictions across LL layers produces network-level metrics:

T^total=l=1Lr^l,P^avg=l=1Lp^lr^ll=1Lr^l,E^CNN=l=1Lp^lr^l\hat T_{\rm total} = \sum_{l=1}^L \hat r_l, \quad \hat P_{\rm avg} = \frac{\sum_{l=1}^L \hat p_l \hat r_l}{\sum_{l=1}^L \hat r_l}, \quad \hat E_{\rm CNN} = \sum_{l=1}^L \hat p_l \hat r_l

2. The Energy-Precision Ratio Metric

To enable principled energy-accuracy trade-off analysis when performing architecture search or hyperparameter optimization, NeuralPower introduces the Energy-Precision Ratio (EPR, also denoted as MαM_\alpha):

Mα=(Error)α×EPI,EPI=E^CNNNinferred,M_{\alpha} = (\mathrm{Error})^{\alpha} \times \mathrm{EPI}, \qquad \mathrm{EPI} = \frac{\hat E_{\rm CNN}}{N_{\mathrm{inferred}}},

where "Error" typically refers to Top-1 or Top-5 classification error and α>0\alpha > 0 is tunable. Lower MαM_\alpha corresponds to more favorable accuracy versus energy profiles.

3. CNN Power Delivery Network Synthesis with NeuralPower

In the context of IC design, NeuralPower [*Editor's term] refers to a CNN-based framework for synthesizing PDN grids that satisfy static IR drop and electromigration (EM) constraints while minimizing routing resource consumption (Chhabria et al., 2021). The framework partitions grid synthesis into two stages—floorplanning and placement—employing separate CNNs:

  • Floorplan Stage ("FP-CNN"): Consumes block-level current, congestion, macro/blockage, and C4 bump maps to assign one of T=8|T|=8 pruned, nondominated PDN templates to each region.
  • Placement Stage ("PL-CNN"): Refines region templates using fine-grained, cell-level current and congestion distributions plus prior assignment; applies small perturbations to balance IR/EM slack and routing demand.

Each region is treated as a tensor assembling up to five multi-channel heatmaps (current, congestion, macro mask, C4 distance, template ID), input into modified LeNet-style CNNs with \approx90M MACs and \approx18M parameters per inference.

4. PDN Template Definition and Selection

PDN templates TiT_i are parameterized by stripe pitch pp_\ell and width ww_\ell on each metal layer \ell. For example, in 65nm LP, templates vary the density on M4, M7, M8 (yielding 27 raw candidates) but are Pareto-pruned by equivalent resistance RiR_i and utilization UiU_i:

  • TjT_j dominates TiT_i if RjRiR_j \leq R_i and UjUiU_j \leq U_i.
  • T=8|T|=8 nondominated templates balance resistance and routing cost.

Each region's assignment to a template ensures maximum ΔVrVth\Delta V_r \leq V_{th} and jrjEM,maxj_r \leq j_{\rm EM, max}. All templates are precharacterized for legal operation under worst-case conditions, such that online grid tiling never violates IR or EM due to template selection alone.

5. Training, Transfer Learning, and Evaluation

NeuralPower for PDN design uses a two-stage training protocol:

  • Synthetic Dataset Generation: Gaussian field-based random current maps, routability maps, macro/blockage/C4 layouts, and template labeling via a simulated annealing optimizer, yielding 9,000–12,250 training samples per tech node.
  • CNN Training: Cross-entropy loss, Adam optimizer, dropout 0.3; synthetic test accuracy 97%\approx97\%.
  • Transfer Learning: CNN convolutional and pooling layers are frozen post-synthetic training; only fully connected layers are reinitialized and trained on limited real-circuit data (e.g., 116 to 241 labeled regions), achieving 90–95% accuracy on real designs.

On OpenROAD testcases (40k–500k cells, up to 225 regions), NeuralPower achieves:

  • 0.9–2.7% reduced track usage in high-congestion regions (≈1,300 tracks saved)
  • Uniform IR drop within 9–11.8 mV on a 12 mV budget
  • EM compliance everywhere (jnorm1j_{\mathrm{norm}} \leq 1)
  • Statistical parity in PDN quality with simulated annealing, yet 20×20\times800×800\times lower run time

6. Practical Guidance and Best Practices

For the original NeuralPower system (Cai et al., 2017), model training on new GPU/framework combinations entails collecting \sim1,000 convolutional, 200 pooling, and 100 fully-connected samples (including power and runtime), fitting sparse polynomial models via Lasso in \approx30 minutes. Once trained, models enable immediate power/runtime predictions for arbitrary networks, eliminating the need for physical compilation or network execution during design iterations.

Best practices for accuracy and interpretability include:

  • Inclusion of both raw and log\log features for power models
  • Degree-3 polynomials for convolutional runtime and degree-2 for power/fully connected layers
  • Incorporation of FLOPs and memory accesses as special features to capture root bottlenecks
  • Always cross-validate polynomial degree and Lasso regularization strength for optimal sparsity

For the PDN synthesis case (Chhabria et al., 2021), practitioners are advised to:

  • Retrain per technology node or bump-pitch specification
  • Consider current limitations: static IR only, fixed region size, absence of dynamic IR/drop or advanced 3D-IC/ESD modeling
  • Extend architecture to integrate with timing-driven flows and online re-training in response to ECO flows as next research steps

7. Limitations and Future Directions

Current NeuralPower frameworks assume static conditions (e.g., steady-state IR drop) and require retraining for technology migration or design style changes. In PDN synthesis, the model does not yet accommodate dynamic IR, leakage, temperature effects, multi-voltage domains, decap modeling, or 3D-IC TSV structures. Future directions include dynamic power and noise modeling, tighter integration with placement/routing, and enabling large-scale transfer learning for diverse process-voltage-temperature (PVT) spaces (Cai et al., 2017, Chhabria et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NeuralPower Framework.