Filter Packing in Secure Neural Inference

Updated 26 January 2026

Filter packing is a secure computation technique that utilizes packed Shamir secret sharing to encode multiple convolution filters simultaneously.
It exploits parallelism in the output-channel dimension to perform packed inner-products, reducing communication and computational overhead.
Empirical results on deep networks like VGG16 and AlexNet show significant improvements in throughput and scalability in WAN environments.

Filter packing is a secure computation technique designed to enable high-throughput, communication-efficient, and scalable neural network inference using multi-party computation (MPC) over packed Shamir secret sharing (PSS). The filter packing approach exploits parallelism in the output-channel dimension of convolutions, allowing multiple filters' computations to be performed in a packed manner, thus amortizing the overhead associated with secure computation and enabling efficient large-scale inference even in wide-area network (WAN) environments (Zhang et al., 19 Jan 2026).

Packed Shamir Secret Sharing (PSS) generalizes classical Shamir secret sharing by encoding $k$ secrets within a single polynomial. For a field $\mathbb{F}_p$ where $p=2^\ell-1$ is a Mersenne prime, and with $n=2d+1$ parties, sharing $k$ secrets $x_0,\ldots,x_{k-1} \in \mathbb{F}_p$ is accomplished by:

Fixing public positions $s_0,\ldots,s_{k-1} \in \mathbb{F}_p$ (distinct from the evaluation points $1,\ldots,n$ ).
Sampling a random polynomial $f(X) = \sum_{j=0}^t a_j X^j$ of degree $t$ that satisfies $f(s_i)=x_i$ for $i=0,\ldots,k-1$ .
The coefficients $a_0,\ldots,a_t$ are chosen such that $k$ correspond to the secrets and $t-(k-1)$ correspond to random padding.
Each party $P_\ell$ receives $f(\ell)$ as their share.
Reconstruction requires $t+1$ shares and employs polynomial interpolation to recover $x_i=f(s_i)$ .

PSS possesses crucial properties:

Linear homomorphism: $\llbracket\vec{x}\rrbracket_t + \llbracket\vec{y}\rrbracket_t = \llbracket\vec{x} + \vec{y}\rrbracket_t$ .
Multiplicative compatibility across degrees (Franklin–Yung): If degrees $d_1,d_2$ satisfy $d_1+d_2<n$ , then the coordinate-wise product of $k$ -packed secrets can be computed locally by combining PSSs.

2. Filter Packing Concept and Parallel Convolution

Filter packing targets convolution operations, packing across the output-channel (filter) dimension. Specifically, consider $k_o$ output channels with each convolution filter having shape $(f_w \times f_h \times c_i)$ :

For each spatial location $(\Delta w, \Delta h, c)$ in the filter banks, form a $k$ -vector $\vec{w} = (w^{(1)}_{\Delta w, \Delta h, c}, \ldots, w^{(k)}_{\Delta w, \Delta h, c})$ that collects the weights at that position across $k$ filters.
The vector $\vec{w}$ is shared as a single PSS $\llbracket \vec{w} \rrbracket_d$ .
The corresponding input pixel is duplicated $k$ times to form $\vec{x} = (x, \ldots, x)$ , and is similarly PSS-shared.

Parallel convolution proceeds as follows:

Each of the $k$ "channels" inner-products is handled as one packed inner-product:

$\llbracket \vec{y}_{\text{out}} \rrbracket_{2d} = \llbracket \vec{w} \rrbracket_d \cdot \llbracket \vec{x} \rrbracket_d$

The outputs are reshared using a $k$ -summing trick, yielding PSS shares of $k$ convolution outputs in one round.
Zero padding is handled by packing zeros in the corresponding input vectors.

This packing methodology significantly amortizes the operation costs across multiple filters, directly benefiting deep CNN architectures.

3. Integration with Secure Vector–Matrix Multiplication

The filter packing approach is tightly integrated with a communication-efficient protocol for vector-matrix multiplication over PSS, underpinned by vector–matrix multiplication-friendly random share tuples (VM-RandTuples). For block-size $k$ and output dimension $v$ :

The tuple is $(\llbracket \vec{r} \rrbracket_{2d} \in \mathbb{F}_p^{kv}, \llbracket \vec{r}' \rrbracket_d \in \mathbb{F}_p^v)$ , with $\vec{r}'_i = \sum_{j=i\cdot k}^{(i+1)k-1} r_j$ .
An oracle $\mathcal{F}_{\text{VM-RandTuple}}$ supplies such tuples offline.
Online, with PSS-shared blocks $\llbracket \vec{a} \rrbracket_d$ and $\llbracket A \rrbracket_d$ , parties perform local packed multiplications plus randomization.
Sharing the resultant $\llbracket \vec{z} \rrbracket_{2d}$ and its reconstruction permits resumming, repacking, and reduction to the desired $k$ -packed output.
This enables one-round online evaluation:

Protocol	Offline Communication	Online Communication	Online Rounds
$\Pi_{\text{VecMatMult}}$	$(1 + 1/k)\cdot(2n\cdot v)/(n+2k-1)$ fields/party	$(1 + 1/k)\cdot u v$ fields/party	1

Sensibly, convolution becomes a special case of matrix multiplication under this construction.

4. Extension to Non-Linear Neural Network Operations

Filter packing extends naturally to non-linearities by parallelizing their application on packed data. Standard Shamir-based elementwise protocols for ReLU, DReLU, maxpool, and bitwise less-than are adapted as follows:

Bitwise Less-Than: Each packed value's $\ell$ -bit mask is revealed using parallel prefix-ORs in $O(\log\ell)$ rounds of packed multiplications.
DReLU: Uses the 2’s-complement MSB trick, packing randomized mask bits, opening masked values, bit-decomposing, and using the $\Pi_{\text{Bitwise-LT}}$ subprotocol.
ReLU and MaxPool: Parallel evaluation and one packed multiplication per ReLU (ReLU( $x$ ) = DReLU( $x$ )· $x$ ); maxpool via repeated pairwise operations, $O(\log m)$ per pooling region.
All key nonlinear subprotocols—except those involving prefix multiplication—remain one-round with the same mask, open, reconstruct, and subtract pattern.

This design ensures that the communication and round complexity of non-linear layers enjoys the same amortization benefits as linear layers.

5. Communication, Computation, and Scalability Analysis

By exploiting filter packing, substantial reductions in both communication and computation are observed relative to protocols without packing:

Let $u \times v$ be vector–matrix multiplication dimensions and $k$ the packing size.
The communication per party for convolutions is:

$\text{Parallel Conv}: \approx (u \cdot m \cdot \ell/k) \cdot (3n/(n+2k-1) + \ldots)\ \text{fields offline},\ 2u m/k\ \text{fields online}$

Both offline and online phases are reduced by an $O(k)$ factor compared to un-packed (Shamir-only) protocols.

Empirical reductions cited for deep networks (AlexNet) include up to 5.85× (offline), 11.17× (online), and 6.83× (total) communication (Zhang et al., 19 Jan 2026).
Experiments over up to 63 Cloud VMs (WAN and LAN) demonstrated, for CIFAR-10/VGG16 with 31 parties:
- Offline communication reduction by 5–6×, online by 10–12×, total by $\sim$ 7×.
- Online phase runtime up to 2.61× faster; total runtime up to 1.75× faster.
- Scalability: successful execution with $n=63$ , $k=29$ (VGG16), with only 545 MB communication, where un-packed protocols ran out of memory.

6. Significance and Advancements over Prior Work

The filter packing approach introduced in (Zhang et al., 19 Jan 2026) addresses severe scalability bottlenecks endemic to previous MPC protocols for neural network inference, particularly those relying on ordinary Shamir secret sharing (e.g., Liu et al., USENIX Security'24). Key advancements include:

Scalability: Enables efficient secure inference among many parties (tested up to $n=63$ ).
Communication Efficiency: Amortizes all operations—linear and non-linear—by a factor of $k$ , dramatically lowering bandwidth requirements.
High Throughput: One-round parallel convolution and vector-matrix multiplication protocols unlock low WAN latency, essential for real-world deployments.
Seamless Support for Deep, Wide Networks: Convolutions spanning large output-channel dimensions benefit strongly from packing, facilitating inference on architectures such as VGG16 and AlexNet that induce prohibitive overhead for prior methods.

The design establishes packed secret sharing and filter packing as foundational primitives for modern, large-scale secure multiparty neural network inference. The paradigm is broadly applicable wherever parallelism in output channels or similar dimensions can be exploited in secret-shared computations.

Markdown Report Issue Upgrade to Chat

References (1)

High-Throughput and Scalable Secure Inference Protocols for Deep Learning with Packed Secret Sharing (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Filter Packing Approach.