Filter Packing in Secure Neural Inference
- Filter packing is a secure computation technique that utilizes packed Shamir secret sharing to encode multiple convolution filters simultaneously.
- It exploits parallelism in the output-channel dimension to perform packed inner-products, reducing communication and computational overhead.
- Empirical results on deep networks like VGG16 and AlexNet show significant improvements in throughput and scalability in WAN environments.
Filter packing is a secure computation technique designed to enable high-throughput, communication-efficient, and scalable neural network inference using multi-party computation (MPC) over packed Shamir secret sharing (PSS). The filter packing approach exploits parallelism in the output-channel dimension of convolutions, allowing multiple filters' computations to be performed in a packed manner, thus amortizing the overhead associated with secure computation and enabling efficient large-scale inference even in wide-area network (WAN) environments (Zhang et al., 19 Jan 2026).
1. Theoretical Foundations: Packed Shamir Secret Sharing
Packed Shamir Secret Sharing (PSS) generalizes classical Shamir secret sharing by encoding secrets within a single polynomial. For a field where is a Mersenne prime, and with parties, sharing secrets is accomplished by:
- Fixing public positions (distinct from the evaluation points ).
- Sampling a random polynomial of degree that satisfies for .
- The coefficients are chosen such that correspond to the secrets and correspond to random padding.
- Each party receives as their share.
- Reconstruction requires shares and employs polynomial interpolation to recover .
PSS possesses crucial properties:
- Linear homomorphism: .
- Multiplicative compatibility across degrees (Franklin–Yung): If degrees satisfy , then the coordinate-wise product of -packed secrets can be computed locally by combining PSSs.
2. Filter Packing Concept and Parallel Convolution
Filter packing targets convolution operations, packing across the output-channel (filter) dimension. Specifically, consider output channels with each convolution filter having shape :
- For each spatial location in the filter banks, form a -vector that collects the weights at that position across filters.
- The vector is shared as a single PSS .
- The corresponding input pixel is duplicated times to form , and is similarly PSS-shared.
Parallel convolution proceeds as follows:
- Each of the "channels" inner-products is handled as one packed inner-product:
- The outputs are reshared using a -summing trick, yielding PSS shares of convolution outputs in one round.
- Zero padding is handled by packing zeros in the corresponding input vectors.
This packing methodology significantly amortizes the operation costs across multiple filters, directly benefiting deep CNN architectures.
3. Integration with Secure Vector–Matrix Multiplication
The filter packing approach is tightly integrated with a communication-efficient protocol for vector-matrix multiplication over PSS, underpinned by vector–matrix multiplication-friendly random share tuples (VM-RandTuples). For block-size and output dimension :
- The tuple is , with .
- An oracle supplies such tuples offline.
- Online, with PSS-shared blocks and , parties perform local packed multiplications plus randomization.
- Sharing the resultant and its reconstruction permits resumming, repacking, and reduction to the desired -packed output.
- This enables one-round online evaluation:
| Protocol | Offline Communication | Online Communication | Online Rounds |
|---|---|---|---|
| fields/party | fields/party | 1 |
Sensibly, convolution becomes a special case of matrix multiplication under this construction.
4. Extension to Non-Linear Neural Network Operations
Filter packing extends naturally to non-linearities by parallelizing their application on packed data. Standard Shamir-based elementwise protocols for ReLU, DReLU, maxpool, and bitwise less-than are adapted as follows:
- Bitwise Less-Than: Each packed value's -bit mask is revealed using parallel prefix-ORs in rounds of packed multiplications.
- DReLU: Uses the 2’s-complement MSB trick, packing randomized mask bits, opening masked values, bit-decomposing, and using the subprotocol.
- ReLU and MaxPool: Parallel evaluation and one packed multiplication per ReLU (ReLU() = DReLU()·); maxpool via repeated pairwise operations, per pooling region.
- All key nonlinear subprotocols—except those involving prefix multiplication—remain one-round with the same mask, open, reconstruct, and subtract pattern.
This design ensures that the communication and round complexity of non-linear layers enjoys the same amortization benefits as linear layers.
5. Communication, Computation, and Scalability Analysis
By exploiting filter packing, substantial reductions in both communication and computation are observed relative to protocols without packing:
- Let be vector–matrix multiplication dimensions and the packing size.
- The communication per party for convolutions is:
Both offline and online phases are reduced by an factor compared to un-packed (Shamir-only) protocols.
- Empirical reductions cited for deep networks (AlexNet) include up to 5.85× (offline), 11.17× (online), and 6.83× (total) communication (Zhang et al., 19 Jan 2026).
- Experiments over up to 63 Cloud VMs (WAN and LAN) demonstrated, for CIFAR-10/VGG16 with 31 parties:
- Offline communication reduction by 5–6×, online by 10–12×, total by 7×.
- Online phase runtime up to 2.61× faster; total runtime up to 1.75× faster.
- Scalability: successful execution with , (VGG16), with only 545 MB communication, where un-packed protocols ran out of memory.
6. Significance and Advancements over Prior Work
The filter packing approach introduced in (Zhang et al., 19 Jan 2026) addresses severe scalability bottlenecks endemic to previous MPC protocols for neural network inference, particularly those relying on ordinary Shamir secret sharing (e.g., Liu et al., USENIX Security'24). Key advancements include:
- Scalability: Enables efficient secure inference among many parties (tested up to ).
- Communication Efficiency: Amortizes all operations—linear and non-linear—by a factor of , dramatically lowering bandwidth requirements.
- High Throughput: One-round parallel convolution and vector-matrix multiplication protocols unlock low WAN latency, essential for real-world deployments.
- Seamless Support for Deep, Wide Networks: Convolutions spanning large output-channel dimensions benefit strongly from packing, facilitating inference on architectures such as VGG16 and AlexNet that induce prohibitive overhead for prior methods.
The design establishes packed secret sharing and filter packing as foundational primitives for modern, large-scale secure multiparty neural network inference. The paradigm is broadly applicable wherever parallelism in output channels or similar dimensions can be exploited in secret-shared computations.