Energy Accuracy Co-Optimized Weight Selection Algorithm
- Energy accuracy co-optimized weight selection algorithms are methods that integrate energy models with weight quantization to optimize deep neural network deployment.
- They use techniques such as pruning, clustering, and dynamic programming to achieve substantial energy savings while ensuring only minor accuracy loss.
- Implementations across varied architectures, including systolic arrays and mixed-signal PIMs, report energy reductions of 45–90% with minimal degradation in performance.
An energy accuracy co-optimized weight selection algorithm is a class of algorithm–hardware co-design methodologies for deep neural networks (DNNs) that systematically selects or constrains the weight set to minimize energy consumption under a user-specified accuracy target (or, conversely, to maximize accuracy under an explicit energy constraint). Energy-accuracy co-optimization explicitly incorporates analytically calibrated, layer- or weight-specific energy models into the process of weight quantization, pruning, clustering, or structural transformation, yielding Pareto-optimal trade-offs appropriate for deployment on accelerators, mixed-signal PIM arrays, or systolic architectures. Algorithmic strategies range from fine-grained greedy selection, layer-wise scheduling, and cross-layer dynamic programming, to convex relaxations and knapsack-type projections. Contemporary state-of-the-art frameworks report reductions of 45–90% in energy-related hardware cost with only sub-1–3% accuracy penalty across a variety of network topologies and platforms.
1. Formal Problem Statement and Objective
The general co-optimization problem is to minimize total energy across a neural network given a chosen weight set (possibly also activations, codebooks, or section mappings), subject to an accuracy constraint: Alternatively, one may set a budget and aim to minimize the loss: In practice, these objectives are instantiated with (i) explicit per-layer or per-weight hardware energy models, (ii) data-driven or structurally parameterized accuracy estimators, and (iii) algorithmic schedules for traversing the configuration space (Fang et al., 21 Nov 2025, Caro et al., 2023, Farias et al., 15 Oct 2024, Petri et al., 2023).
2. Layer- and Weight-Level Energy Modeling
Advanced frameworks build detailed energy models that capture the dominant contributors to inference energy. These include:
- MAC switching energy: In weight-stationary systolic arrays, energy per multiply–accumulate operation is parameterized by both the particular weight encoding and the observed partial-sum and activation transitions. Fine-grained gate-level simulation and clustering—e.g., MSB/Hamming distance grouping—yield per-weight, per-layer energy tables (Fang et al., 21 Nov 2025, Petri et al., 2023).
- Memory hierarchy models: DRAM, SRAM, and register-file energy per access is estimated from microarchitectural data, with the count of accesses directly dependent on the weight representation and sparsity (Caro et al., 2023, Farias et al., 15 Oct 2024, Chen et al., 2021).
- Peripheral and ADC cost: For compute-in-memory (CIM) and mixed-signal arrays, the energy and area of ADCs (often >80% of total cost) are modeled as per conversion, motivating algorithmic strategies such as weight sectioning and ADC bit-depth reconfiguration (Farias et al., 15 Oct 2024, Behnam et al., 2022).
- Data movement and hybrid digital–analog metrics: Frameworks such as HybridAC assign weights to either analog crossbars or digital cores based on sensitivity, and model overall energy as a sum of analog MAC, ADC, data movement, and digital MAC terms (Behnam et al., 2022).
3. Algorithmic Methodologies for Weight Selection
Multiple algorithmic paradigms co-optimize weights and energy:
- Greedy or backward elimination: Layer-wise selection ranks candidate weight codes by energy gain divided by marginal accuracy loss, iteratively pruning until the global accuracy drop is met (Fang et al., 21 Nov 2025, Caro et al., 2023).
- Sparse projection and knapsack: When energy can be written as a linear function of sparsity or codebook size, the Euclidean projection onto the energy budget reduces to a 0/1 knapsack problem over vectorized weights, solved greedily by a profit-density sort (Yang et al., 2018). Input masking for activation sparsity augments this approach.
- Clustering and quantization: K-means clustering produces per-layer or global codebooks, reducing DRAM and memory-access energy while exposing a quantization–energy–accuracy frontier (Caro et al., 2023, Chen et al., 2021).
- Sensitivity scoring and hybrid mapping: For mixed-signal PIM, Hessian-based estimates of per-weight or per-channel loss sensitivity identify a minimal subset of "high-impact" weights to be mapped to robust—and energy-costlier—digital compute (Behnam et al., 2022).
- Structural decomposition: SmartDeal factorizes weights into sparse, power-2 quantized coefficient matrices and small dense bases, enabling aggressive reduction of expensive memory traffic with trivial on-the-fly shift/add computation (Chen et al., 2021).
- Sectioning and bit-sliced partitioning: In SWS, sorting weights by magnitude and grouping the smallest into low-precision crossbar sections allows the vast majority of MACs to be computed using minimal ADC resources, achieving near-maximal energy reduction (Farias et al., 15 Oct 2024).
4. Optimization Scheduling and Layer-Wise Strategies
Optimal allocation of energy/accuracy budget across layers is critical:
- Layer-wise prioritization: Ranking layers by pre-compression energy share, higher-impact layers are compressed more aggressively, subject to global accuracy bounds (Fang et al., 21 Nov 2025).
- Greedy dynamic programming: At each iteration, the configuration of layer quantization levels, cluster sizes, or candidate sets is greedily adjusted to maximize energy savings per unit accuracy drop, producing a near-optimal configuration in steps where is the number of layers (Caro et al., 2023).
- Per-layer block-circulant tuning: For block-circulant compression, a small grid search over block sizes per layer, followed by a lightweight DP or greedy search, yields the optimal trade-off between accuracy degradation and energy (Wang et al., 2018).
5. Integration with Hardware and System Architectures
Hardware-aware algorithmic design is key:
- Systolic arrays: Weight code selection is synchronized with MAC-level switching and delay models, enabling not only energy reduction but also safe voltage scaling for further power gains (Fang et al., 21 Nov 2025, Petri et al., 2023).
- Mixed-signal and CIM arrays: Algorithms such as SWS, HybridAC, and weight sectioning reorganize weights, ADC allocation, and peripheral activity to exploit the architectural bottlenecks unique to CIM (e.g., ADCs, current leakage, crossbar row/column utilization) (Farias et al., 15 Oct 2024, Behnam et al., 2022).
- Memory–computation synergy: Decomposition-based approaches like SmartDeal design the software transformation of weights to maximize on-chip caching and computation reuse, with hardware engines specifically built to reassemble sparse, low-bit weights with minimal memory reads (Chen et al., 2021).
6. Empirical Trade-offs and Quantitative Outcomes
Representative energy–accuracy trade-off outcomes include:
| Method / Network | Energy Saving | Accuracy Drop | Reference |
|---|---|---|---|
| SWS on BERT (90% sparse) | –89.5% ADC | <0.1% | (Farias et al., 15 Oct 2024) |
| Layer-wise sel. (ResNet) | –58.6% systolic | ~3% | (Fang et al., 21 Nov 2025) |
| Clust. (YOLOv3, 5-bit) | –57% memory | ~1.2% mAP | (Caro et al., 2023) |
| Incremental (LeNet) | –95.5% avg. | 0.60pp | (Tann et al., 2016) |
| Dropback (ResNet18) | –11.7× weight acc | ~0% | (Golub et al., 2018) |
| PowerPruning (ResNet-20) | –50.9% power | 3% | (Petri et al., 2023) |
| HybridAC (ResNet18) | –52% PIM energy | 0.2% | (Behnam et al., 2022) |
| SmartDeal (ResNet50) | –2.44× ASIC E | 0.82% | (Chen et al., 2021) |
These approaches achieve energy efficiency improvements primarily via the reduction of DRAM/SRAM accesses, optimization of MAC circuit activity, efficient ADC usage, and minimization of unnecessary switching or high-delay transitions, with carefully controlled impact on model fidelity.
7. Architectural and Implementation Considerations
- Controller and overhead: Weight selection algorithms incur negligible hardware overhead (e.g., margin controllers, multiplexers for weight permutation, or Huffman decoders), as reported across ASIC, FPGA, and GPU platforms (Tann et al., 2016, Farias et al., 15 Oct 2024, Chen et al., 2021).
- Granularity: Methods such as incremental training and coarse-to-fine networks allow fine runtime control over energy/accuracy at inference by dynamically selecting which subnetworks or channel sets to activate per sample (Tann et al., 2016, Jayakodi et al., 2019).
- Non-retraining schemes: HybridAC and certain clustering-based techniques require no retraining—weight/channel selection is performed post hoc given sensitivity and energy profiling (Behnam et al., 2022, Caro et al., 2023).
- Compatibility and extensibility: Most frameworks integrate seamlessly with quantization-aware training, sparse activation schemes, and knowledge distillation for further gain under strict hardware constraints (Yang et al., 2018, Chen et al., 2021, Fang et al., 21 Nov 2025).
References
- (Fang et al., 21 Nov 2025) Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration
- (Farias et al., 15 Oct 2024) Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
- (Caro et al., 2023) At-Scale Evaluation of Weight Clustering to Enable Energy-Efficient Object Detection
- (Petri et al., 2023) PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration
- (Behnam et al., 2022) An Algorithm-Hardware Co-design Framework to Overcome Imperfections of Mixed-signal DNN Accelerators
- (Chen et al., 2021) SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training
- (Golub et al., 2018) Full deep neural network training on a pruned weight budget
- (Yang et al., 2018) Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- (Wang et al., 2018) Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
- (Tann et al., 2016) Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off
- (Jayakodi et al., 2019) Trading-off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach