Papers
Topics
Authors
Recent
Search
2000 character limit reached

MACPruning: Efficient DNN Operation Reduction

Updated 20 January 2026
  • MACPruning is defined as the process of reducing multiply–accumulate operations in deep neural networks through structured, unstructured, and dynamic pruning methods.
  • It employs techniques like channel/weight removal and input-adaptive skipping to optimize computational efficiency and energy use on resource-constrained devices.
  • MACPruning can drastically increase side-channel attack complexity, though improper implementations may leak control-flow details that weaken security.

Multiply–Accumulate Pruning (MACPruning) refers to techniques that aim to reduce the number of multiply–accumulate (MAC) operations performed by deep neural networks, either to improve computational and energy efficiency or to provide security benefits against side-channel attacks. MACPruning encompasses both structured and unstructured pruning regimes as well as dynamic, input-adaptive approaches, often manipulating the execution path of DNN inference to optimize for resource constraints or to resist model parameter extraction.

1. Definition and Core Principles

MACPruning is defined as the process of reducing the number of MAC operations in neural network inference by removing, skipping, or adapting execution of certain weights, activations, or operations. The motivation for MACPruning arises from the high computational cost and energy requirements of DNN inference, especially on resource-limited devices such as MCUs and edge AI chips, as well as from the vulnerability of DNN parameters to side-channel analysis (SCA) attacks in such settings (Neth et al., 10 Jul 2025, Ding et al., 20 Feb 2025).

MACPruning can be achieved via:

  • Structured pruning: Coarse-grained removal of entire channels, filters, or blocks, typically determined at training or compile-time.
  • Unstructured pruning: Fine-grained removal or dynamic skipping of individual scalar connections, possibly at inference time, often input-adaptive.
  • Random dynamic operation pruning: Stochastic pruning of operations based on random or importance-aware maps to increase SCA resilience.

The decision criteria may involve magnitude-based heuristics, importance learned through gradient signals, resource constraints (e.g., FLOPs or latency budgets), or deliberate randomness for security.

2. Dynamic Inference-Time MACPruning and Security Applications

A class of MACPruning methods targets security—for instance, countering DNN model extraction via SCA such as Differential Electromagnetic Analysis (DEMA). MACPruning, as described by Ding et al., adopts an inference-time, randomized, input-dependent approach (Ding et al., 20 Feb 2025). The key insight is that DNNs are robust to missing inputs, allowing some fraction of input pixels (and their corresponding first-layer MACs) to be randomly skipped without significant accuracy loss.

The method introduces two primary mechanisms:

  • Random Pixel Activation Map (RPAM): Independently drops pixels with probability $1-p$, thereby skipping their MACs.
  • Importance-Aware Pixel Activation Map (IaPAM): Learns a critical subset of input pixels (by optimizing a binary mask via gradient descent) that are always retained, while non-critical pixels are dropped randomly.

Formally, the randomized execution is

processi={1if δi=1 riBernoulli(p)if δi=0\text{process}_i = \begin{cases} 1 & \text{if }\delta_i=1 \ r_i\sim\mathrm{Bernoulli}(p) & \text{if }\delta_i=0 \end{cases}

where δi\delta_i denotes importance and pp is the retention probability for non-important weights.

Randomized skipping induces desynchronization in the timing and intermediate values of MACs, exponentially increasing the number of SCA traces required for successful model extraction—a result directly quantified through SNR-based analysis and practical attacks (Ding et al., 20 Feb 2025).

3. MACPruning as a Countermeasure: Claims and Vulnerabilities

Initially, MACPruning-based SCA countermeasures were believed to provide exponential growth in attack complexity relative to the number of skipped MACs (Ding et al., 20 Feb 2025). Empirical results confirmed that, with p=0.5p=0.5, DEMA attacks beyond the first few weights of a neuron required 10610^6 traces for recovery, compared to 10410^4 for the unprotected baseline, while incurring only a 1–3% accuracy drop and <1% runtime overhead.

However, recent studies identified fundamental vulnerabilities in MACPruning's standard implementation. Specifically, the branching logic used to decide whether to execute a MAC leaves observable control-flow patterns in power and timing traces (e.g., conditional branches in ARM Cortex-M4 assembly) (Casalino et al., 13 Jan 2026). An attacker can reconstruct the importance mask and align traces accordingly, effectively neutralizing the desynchronization and recovering critical parameters with near-baseline efficiency. Furthermore, certain microarchitectural effects allow even some non-important weights to be extracted. In tested scenarios, 96% of important and 100% of some non-important weights were recovered within the original trace budget (Casalino et al., 13 Jan 2026).

4. Structured and Unstructured MACPruning for Resource Optimization

MACPruning also encompasses mechanisms to optimize for latency, energy, and memory on embedded devices. Techniques include:

  • Unstructured inference-time pruning (e.g., UnIT): Dynamically skips individual MACs when their product can be proven below a calibrated threshold, trading only absolute-value comparisons and lightweight division approximations (bit-shifts, binary search, float exponent comparison) for full multiplication (Neth et al., 10 Jul 2025). Thresholds may be determined per-layer or per-group during a calibration pass. The architecture is designed to minimize divisions (which are costly on MCUs) by exploiting operand reuse and grouping.
  • Structured resource-aware pruning: Classic channel/filter pruning, e.g., via adaptive magnitude thresholds or cost-constrained optimization, generally decided during training or compile-time and then fixed for inference (Humble et al., 2022, Wang et al., 2019). Such strategies may employ global knapsack solvers or differentiable relaxations to fit MAC count or FLOPs constraints.

Empirical evidence demonstrates that unstructured, input-aware methods can achieve 11.02%–82.03% MAC reduction, 27.30%–84.19% lower latency, and up to 84.38% energy savings while maintaining bounded accuracy loss (0.48%–7.0%), outperforming most static pruning configurations, especially under domain shift (Neth et al., 10 Jul 2025).

5. Algorithms, Mathematical Formulation, and Pseudocode

The mathematical characterization of MACPruning in security and efficiency contexts involves:

  • Random pruning probability: For RPAM/IaPAM, the probability that the jj-th MAC occurs in its nominal position is pj=pmax(p,1p)j1p_j = p \cdot \max(p, 1-p)^{j-1}, driving the SCA resistance factor R=p2max(p,1p)2(j1)R = p^{-2} \max(p,1-p)^{-2(j-1)} (Ding et al., 20 Feb 2025).
  • Importance masking and training objective: For IaPAM, mask variables {mi,j,c}\{m_{i,j,c}\} are trained via

L=LCE(f(σ(m)X),y)+αnnz(σ(m))HWCq\mathcal{L} = L_{CE}\left(f\left(\sigma(m)\odot X\right), y\right) + \alpha \left|\frac{\mathrm{nnz}(\sigma(m))}{HWC} - q\right|

with qq the critical pixel fraction. After training, the top-qq% are fixed, and the remainder are dropped with adjusted probability.

  • UnIT's MAC-free pruning: All candidate MACs are pre-filtered using the per-connection rule XWT|X \cdot W| \leq T, transformed to WT/X|W| \leq T/|X| in linear layers or XT/W|X| \leq T/|W| in convolutional layers. Fast division is achieved via bitshift, binary search, or exponent comparison, with division amortized over groups of connections (Neth et al., 10 Jul 2025). Pseudocode for a linear layer:
    1
    2
    3
    4
    5
    6
    7
    
    for each input activation X_i in layer l:
        τ_i = precomputed_threshold_l[i]    # T_l / |X_i| (approximated)
        for each outgoing weight W_i_j:
            if abs(W_i_j) <= τ_i:           # pruning check
                continue                    # skip MAC
            else:
                O_j += X_i * W_i_j          # perform MAC
  • Control-flow side-channel leakage: As shown in (Casalino et al., 13 Jan 2026), standard branch-based pruning code exposes IaPAM and execution patterns via timing/power signatures, enabling trace alignment attacks.

6. Empirical Results and Comparative Impact

MACPruning, when applied for SCA defense, achieves:

  • Up to a 100-fold increase in the number of DEMA traces required for model extraction with p=0.5p=0.5 (up to 10610^6 traces for certain weights), negligible runtime (<1%), and minimal accuracy drop (1–3%) (Ding et al., 20 Feb 2025).
  • However, when control-flow leakage is present, up to 96% of important and all of certain unimportant weights can still be extracted with standard trace budgets, rendering the method ineffective as a defensive mechanism in unmodified implementations (Casalino et al., 13 Jan 2026).

For resource-constrained inference:

  • UnIT (Unstructured Inference-Time pruning) achieves 11–82% reduction in MACs, up to 84% reduction in inference time and energy on MCUs, without hardware modification or retraining, and with competitive or superior accuracy retention under distribution shift; this demonstrates practical applicability for low-power neural inference (Neth et al., 10 Jul 2025).
  • Related structured MACPruning approaches, such as SMCP and architecture-aware iterative schemes, yield 37–58% MAC reductions in state-of-the-art vision applications with minimal or no loss in application-level quality (e.g., PSNR/SSIM for SID/EDSR) (Wang et al., 2019, Humble et al., 2022).

7. Limitations, Controversies, and Future Directions

MACPruning's security efficacy is contingent on microarchitectural details and the absence of control-flow dependencies in execution traces. Real-world implementations can inadvertently leak pruning decisions via low-level timing, completely circumventing the intended SCA resistance (Casalino et al., 13 Jan 2026).

For efficiency, selection of pruning thresholds, grouping strategies, or mask fine-tuning requires careful calibration. Overaggressive pruning, particularly of low-compute or semantically critical layers, induces irreversible loss in task quality. The extension to structured block pruning or further integration with automated hyperparameter optimization (e.g., via reinforcement learning) remains open (Wang et al., 2019).

A plausible implication is that side-channel secure MACPruning will require either branchless hardware support, constant-time operation selection, or explicit randomized dummy computation to avoid recoverable control-flow leakage. For resource-efficient deployment, hybrid approaches combining training-time structured pruning and inference-time unstructured dynamic MACPruning appear promising, particularly for adapting to varying input distributions and real-time constraints (Neth et al., 10 Jul 2025, Ding et al., 20 Feb 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MACPRUNING.