Papers
Topics
Authors
Recent
Search
2000 character limit reached

Filter Packing in Secure Neural Inference

Updated 26 January 2026
  • Filter packing is a secure computation technique that utilizes packed Shamir secret sharing to encode multiple convolution filters simultaneously.
  • It exploits parallelism in the output-channel dimension to perform packed inner-products, reducing communication and computational overhead.
  • Empirical results on deep networks like VGG16 and AlexNet show significant improvements in throughput and scalability in WAN environments.

Filter packing is a secure computation technique designed to enable high-throughput, communication-efficient, and scalable neural network inference using multi-party computation (MPC) over packed Shamir secret sharing (PSS). The filter packing approach exploits parallelism in the output-channel dimension of convolutions, allowing multiple filters' computations to be performed in a packed manner, thus amortizing the overhead associated with secure computation and enabling efficient large-scale inference even in wide-area network (WAN) environments (Zhang et al., 19 Jan 2026).

1. Theoretical Foundations: Packed Shamir Secret Sharing

Packed Shamir Secret Sharing (PSS) generalizes classical Shamir secret sharing by encoding kk secrets within a single polynomial. For a field Fp\mathbb{F}_p where p=21p=2^\ell-1 is a Mersenne prime, and with n=2d+1n=2d+1 parties, sharing kk secrets x0,,xk1Fpx_0,\ldots,x_{k-1} \in \mathbb{F}_p is accomplished by:

  • Fixing public positions s0,,sk1Fps_0,\ldots,s_{k-1} \in \mathbb{F}_p (distinct from the evaluation points 1,,n1,\ldots,n).
  • Sampling a random polynomial f(X)=j=0tajXjf(X) = \sum_{j=0}^t a_j X^j of degree tt that satisfies f(si)=xif(s_i)=x_i for i=0,,k1i=0,\ldots,k-1.
  • The coefficients a0,,ata_0,\ldots,a_t are chosen such that kk correspond to the secrets and t(k1)t-(k-1) correspond to random padding.
  • Each party PP_\ell receives f()f(\ell) as their share.
  • Reconstruction requires t+1t+1 shares and employs polynomial interpolation to recover xi=f(si)x_i=f(s_i).

PSS possesses crucial properties:

  • Linear homomorphism: xt+yt=x+yt\llbracket\vec{x}\rrbracket_t + \llbracket\vec{y}\rrbracket_t = \llbracket\vec{x} + \vec{y}\rrbracket_t.
  • Multiplicative compatibility across degrees (Franklin–Yung): If degrees d1,d2d_1,d_2 satisfy d1+d2<nd_1+d_2<n, then the coordinate-wise product of kk-packed secrets can be computed locally by combining PSSs.

2. Filter Packing Concept and Parallel Convolution

Filter packing targets convolution operations, packing across the output-channel (filter) dimension. Specifically, consider kok_o output channels with each convolution filter having shape (fw×fh×ci)(f_w \times f_h \times c_i):

  • For each spatial location (Δw,Δh,c)(\Delta w, \Delta h, c) in the filter banks, form a kk-vector w=(wΔw,Δh,c(1),,wΔw,Δh,c(k))\vec{w} = (w^{(1)}_{\Delta w, \Delta h, c}, \ldots, w^{(k)}_{\Delta w, \Delta h, c}) that collects the weights at that position across kk filters.
  • The vector w\vec{w} is shared as a single PSS wd\llbracket \vec{w} \rrbracket_d.
  • The corresponding input pixel is duplicated kk times to form x=(x,,x)\vec{x} = (x, \ldots, x), and is similarly PSS-shared.

Parallel convolution proceeds as follows:

  • Each of the kk "channels" inner-products is handled as one packed inner-product:

yout2d=wdxd\llbracket \vec{y}_{\text{out}} \rrbracket_{2d} = \llbracket \vec{w} \rrbracket_d \cdot \llbracket \vec{x} \rrbracket_d

  • The outputs are reshared using a kk-summing trick, yielding PSS shares of kk convolution outputs in one round.
  • Zero padding is handled by packing zeros in the corresponding input vectors.

This packing methodology significantly amortizes the operation costs across multiple filters, directly benefiting deep CNN architectures.

3. Integration with Secure Vector–Matrix Multiplication

The filter packing approach is tightly integrated with a communication-efficient protocol for vector-matrix multiplication over PSS, underpinned by vector–matrix multiplication-friendly random share tuples (VM-RandTuples). For block-size kk and output dimension vv:

  • The tuple is (r2dFpkv,rdFpv)(\llbracket \vec{r} \rrbracket_{2d} \in \mathbb{F}_p^{kv}, \llbracket \vec{r}' \rrbracket_d \in \mathbb{F}_p^v), with ri=j=ik(i+1)k1rj\vec{r}'_i = \sum_{j=i\cdot k}^{(i+1)k-1} r_j.
  • An oracle FVM-RandTuple\mathcal{F}_{\text{VM-RandTuple}} supplies such tuples offline.
  • Online, with PSS-shared blocks ad\llbracket \vec{a} \rrbracket_d and Ad\llbracket A \rrbracket_d, parties perform local packed multiplications plus randomization.
  • Sharing the resultant z2d\llbracket \vec{z} \rrbracket_{2d} and its reconstruction permits resumming, repacking, and reduction to the desired kk-packed output.
  • This enables one-round online evaluation:
Protocol Offline Communication Online Communication Online Rounds
ΠVecMatMult\Pi_{\text{VecMatMult}} (1+1/k)(2nv)/(n+2k1)(1 + 1/k)\cdot(2n\cdot v)/(n+2k-1) fields/party (1+1/k)uv(1 + 1/k)\cdot u v fields/party 1

Sensibly, convolution becomes a special case of matrix multiplication under this construction.

4. Extension to Non-Linear Neural Network Operations

Filter packing extends naturally to non-linearities by parallelizing their application on packed data. Standard Shamir-based elementwise protocols for ReLU, DReLU, maxpool, and bitwise less-than are adapted as follows:

  • Bitwise Less-Than: Each packed value's \ell-bit mask is revealed using parallel prefix-ORs in O(log)O(\log\ell) rounds of packed multiplications.
  • DReLU: Uses the 2’s-complement MSB trick, packing randomized mask bits, opening masked values, bit-decomposing, and using the ΠBitwise-LT\Pi_{\text{Bitwise-LT}} subprotocol.
  • ReLU and MaxPool: Parallel evaluation and one packed multiplication per ReLU (ReLU(xx) = DReLU(xxxx); maxpool via repeated pairwise operations, O(logm)O(\log m) per pooling region.
  • All key nonlinear subprotocols—except those involving prefix multiplication—remain one-round with the same mask, open, reconstruct, and subtract pattern.

This design ensures that the communication and round complexity of non-linear layers enjoys the same amortization benefits as linear layers.

5. Communication, Computation, and Scalability Analysis

By exploiting filter packing, substantial reductions in both communication and computation are observed relative to protocols without packing:

  • Let u×vu \times v be vector–matrix multiplication dimensions and kk the packing size.
  • The communication per party for convolutions is:

Parallel Conv:(um/k)(3n/(n+2k1)+) fields offline, 2um/k fields online\text{Parallel Conv}: \approx (u \cdot m \cdot \ell/k) \cdot (3n/(n+2k-1) + \ldots)\ \text{fields offline},\ 2u m/k\ \text{fields online}

Both offline and online phases are reduced by an O(k)O(k) factor compared to un-packed (Shamir-only) protocols.

  • Empirical reductions cited for deep networks (AlexNet) include up to 5.85× (offline), 11.17× (online), and 6.83× (total) communication (Zhang et al., 19 Jan 2026).
  • Experiments over up to 63 Cloud VMs (WAN and LAN) demonstrated, for CIFAR-10/VGG16 with 31 parties:
    • Offline communication reduction by 5–6×, online by 10–12×, total by \sim7×.
    • Online phase runtime up to 2.61× faster; total runtime up to 1.75× faster.
    • Scalability: successful execution with n=63n=63, k=29k=29 (VGG16), with only 545 MB communication, where un-packed protocols ran out of memory.

6. Significance and Advancements over Prior Work

The filter packing approach introduced in (Zhang et al., 19 Jan 2026) addresses severe scalability bottlenecks endemic to previous MPC protocols for neural network inference, particularly those relying on ordinary Shamir secret sharing (e.g., Liu et al., USENIX Security'24). Key advancements include:

  • Scalability: Enables efficient secure inference among many parties (tested up to n=63n=63).
  • Communication Efficiency: Amortizes all operations—linear and non-linear—by a factor of kk, dramatically lowering bandwidth requirements.
  • High Throughput: One-round parallel convolution and vector-matrix multiplication protocols unlock low WAN latency, essential for real-world deployments.
  • Seamless Support for Deep, Wide Networks: Convolutions spanning large output-channel dimensions benefit strongly from packing, facilitating inference on architectures such as VGG16 and AlexNet that induce prohibitive overhead for prior methods.

The design establishes packed secret sharing and filter packing as foundational primitives for modern, large-scale secure multiparty neural network inference. The paradigm is broadly applicable wherever parallelism in output channels or similar dimensions can be exploited in secret-shared computations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Filter Packing Approach.