NeuralPower Framework
- NeuralPower Frameworks are neural network-based systems that predict CNN energy, runtime, and power to guide design decisions in both deep learning and IC PDN synthesis.
- They leverage layer-wise sparse polynomial regression with Lasso regularization to generate accurate, interpretable models of CNN components and deliver energy-accuracy trade-offs using the Energy-Precision Ratio.
- The framework also integrates CNN-driven PDN synthesis to assign optimal grid templates, ensuring compliance with IR drop and electromigration constraints while reducing routing resource usage.
NeuralPower Frameworks represent a class of neural network-driven predictive and optimization systems for the estimation and design of power, runtime, and energy in convolutional neural networks (CNNs) and for power delivery network (PDN) synthesis. Originating from two primary research threads, one for system-level energy profiling and architectural trade-off analysis in deep learning (Cai et al., 2017), and another for PDN grid synthesis in integrated circuit (IC) design (Chhabria et al., 2021), NeuralPower frameworks provide model- and region-specific predictions that guide design choices for energy-accuracy trade-offs and resource allocation while ensuring physical and operational constraints are met.
1. NeuralPower for CNN Energy Profiling and Prediction
NeuralPower, as defined in (Cai et al., 2017), is a predictive framework based on layer-wise sparse polynomial regression, specifically developed to estimate the serving energy consumption, power, and runtime for CNN inference on GPU platforms before model training. For each CNN layer (convolutional, pooling, or fully connected), two distinct regression models are fitted: one for inference runtime and one for average power . The models are constructed as follows:
- Runtime Model:
where are "raw" features (batch size, tensor dimensions, kernel size, stride, etc.), and are "special" features (total FLOPs, memory access counts).
- Power Model:
where includes both the original features and their logarithms (to model power saturation effects).
Model sparsity and interpretability are achieved through Lasso () regularization, selecting a reduced subset of nonzero terms (typically 20–75 features per model). The regression degree is cross-validated ( up to 3 for conv runtime; for power/FC).
Aggregating predictions across layers produces network-level metrics:
2. The Energy-Precision Ratio Metric
To enable principled energy-accuracy trade-off analysis when performing architecture search or hyperparameter optimization, NeuralPower introduces the Energy-Precision Ratio (EPR, also denoted as ):
where "Error" typically refers to Top-1 or Top-5 classification error and is tunable. Lower corresponds to more favorable accuracy versus energy profiles.
3. CNN Power Delivery Network Synthesis with NeuralPower
In the context of IC design, NeuralPower [*Editor's term] refers to a CNN-based framework for synthesizing PDN grids that satisfy static IR drop and electromigration (EM) constraints while minimizing routing resource consumption (Chhabria et al., 2021). The framework partitions grid synthesis into two stages—floorplanning and placement—employing separate CNNs:
- Floorplan Stage ("FP-CNN"): Consumes block-level current, congestion, macro/blockage, and C4 bump maps to assign one of pruned, nondominated PDN templates to each region.
- Placement Stage ("PL-CNN"): Refines region templates using fine-grained, cell-level current and congestion distributions plus prior assignment; applies small perturbations to balance IR/EM slack and routing demand.
Each region is treated as a tensor assembling up to five multi-channel heatmaps (current, congestion, macro mask, C4 distance, template ID), input into modified LeNet-style CNNs with 90M MACs and 18M parameters per inference.
4. PDN Template Definition and Selection
PDN templates are parameterized by stripe pitch and width on each metal layer . For example, in 65nm LP, templates vary the density on M4, M7, M8 (yielding 27 raw candidates) but are Pareto-pruned by equivalent resistance and utilization :
- dominates if and .
- nondominated templates balance resistance and routing cost.
Each region's assignment to a template ensures maximum and . All templates are precharacterized for legal operation under worst-case conditions, such that online grid tiling never violates IR or EM due to template selection alone.
5. Training, Transfer Learning, and Evaluation
NeuralPower for PDN design uses a two-stage training protocol:
- Synthetic Dataset Generation: Gaussian field-based random current maps, routability maps, macro/blockage/C4 layouts, and template labeling via a simulated annealing optimizer, yielding 9,000–12,250 training samples per tech node.
- CNN Training: Cross-entropy loss, Adam optimizer, dropout 0.3; synthetic test accuracy .
- Transfer Learning: CNN convolutional and pooling layers are frozen post-synthetic training; only fully connected layers are reinitialized and trained on limited real-circuit data (e.g., 116 to 241 labeled regions), achieving 90–95% accuracy on real designs.
On OpenROAD testcases (40k–500k cells, up to 225 regions), NeuralPower achieves:
- 0.9–2.7% reduced track usage in high-congestion regions (≈1,300 tracks saved)
- Uniform IR drop within 9–11.8 mV on a 12 mV budget
- EM compliance everywhere ()
- Statistical parity in PDN quality with simulated annealing, yet – lower run time
6. Practical Guidance and Best Practices
For the original NeuralPower system (Cai et al., 2017), model training on new GPU/framework combinations entails collecting 1,000 convolutional, 200 pooling, and 100 fully-connected samples (including power and runtime), fitting sparse polynomial models via Lasso in 30 minutes. Once trained, models enable immediate power/runtime predictions for arbitrary networks, eliminating the need for physical compilation or network execution during design iterations.
Best practices for accuracy and interpretability include:
- Inclusion of both raw and features for power models
- Degree-3 polynomials for convolutional runtime and degree-2 for power/fully connected layers
- Incorporation of FLOPs and memory accesses as special features to capture root bottlenecks
- Always cross-validate polynomial degree and Lasso regularization strength for optimal sparsity
For the PDN synthesis case (Chhabria et al., 2021), practitioners are advised to:
- Retrain per technology node or bump-pitch specification
- Consider current limitations: static IR only, fixed region size, absence of dynamic IR/drop or advanced 3D-IC/ESD modeling
- Extend architecture to integrate with timing-driven flows and online re-training in response to ECO flows as next research steps
7. Limitations and Future Directions
Current NeuralPower frameworks assume static conditions (e.g., steady-state IR drop) and require retraining for technology migration or design style changes. In PDN synthesis, the model does not yet accommodate dynamic IR, leakage, temperature effects, multi-voltage domains, decap modeling, or 3D-IC TSV structures. Future directions include dynamic power and noise modeling, tighter integration with placement/routing, and enabling large-scale transfer learning for diverse process-voltage-temperature (PVT) spaces (Cai et al., 2017, Chhabria et al., 2021).