Packed Shamir Secret Sharing (PSS)

Updated 26 January 2026

Packed Shamir Secret Sharing is a method that encodes a vector of secrets into a single polynomial, enabling parallel sharing while preserving threshold privacy.
It optimizes secure multi-party computation by reducing communication overhead and increasing throughput, notably for deep neural network inference.
The scheme employs VM-RandTuple structures and filter packing techniques to support efficient vector–matrix multiplication and convolution with maintained t-privacy.

Packed Shamir Secret Sharing (PSS) is a generalization of Shamir’s $(t, n)$ -threshold secret sharing scheme, enabling the encoding of a vector of $k$ secrets into a single polynomial of degree $d \geq k-1$ over a finite field. This structure permits parallel, or "packed," computation over multiple secret values with communication and round complexity closely matching that for sharing a single value. PSS is particularly designed to enhance throughput and scalability in secure multi-party computation (MPC), especially for deep neural network inference in honest-majority settings, by reducing the otherwise prohibitive communication overhead and enabling high degrees of parallelism (Zhang et al., 19 Jan 2026).

1. Formal Definition and Construction

PSS operates over the field $\mathbb{F}_p$ , where $p=2^\ell-1$ is a Mersenne prime (with typical choices $\ell\in\{31,61\}$ ), optimizing arithmetic efficiency. For $n=2d+1$ servers and packing factor $k\le d$ , the threshold is set to $t=d-k+1$ for privacy. A vector of $k$ secrets $s_0, \dots, s_{k-1}\in\mathbb{F}_p$ is packed as the coefficients of a degree- $(k-1)$ polynomial:

$f(X) = \sum_{j=0}^{k-1} s_j X^j \bmod p.$

Each server $i$ is assigned a publicly known, pairwise-distinct point $x_i\in \mathbb{F}_p$ and receives a share $f(x_i)$ . Any $d+1$ shares suffice to reconstruct the entire vector by Lagrange interpolation, whereas any $t$ or fewer reveal nothing. Extraction of the $j$ th secret utilizes the Lagrange coefficients:

$s_j = \sum_{i=1}^{n} f(x_i)\, \lambda_{i,j},$

where $\lambda_{i,j}$ are determined by the interpolation basis at the corresponding evaluation points. This construction provides parallel privacy and reconstruction for the $k$ secrets at the cost of a single polynomial evaluation per server, achieving packing efficiency while maintaining $(t,n)$ threshold security properties (Zhang et al., 19 Jan 2026).

2. VM-RandTuple Structures for Vector–Matrix Multiplication

Efficient secure vector–matrix multiplication in the MPC context leverages Vector-Matrix Multiplication–Friendly Random Share Tuples (VM-RandTuples). A VM-RandTuple is a pair $(\llbracket r \rrbracket_{2d}, \llbracket r' \rrbracket_d)$ , where $r$ is a packed vector in $\mathbb{F}_p^{kv}$ and $r'_i = \sum_{j=ik}^{(i+1)k-1} r_j$ produces a packed sum for each output coordinate. The protocol generates these tuples offline using a two-round Vandermonde-matrix method, with each server secret-sharing $k^2$ random values and exchanging linear combinations via the transposed Vandermonde matrices. Privacy for up to $t$ colluding servers is preserved, as each PSS instance ensures information-theoretic secrecy for packs of $k$ secrets.

This structure allows vector–matrix or matrix–matrix products to be performed in parallel across all $k$ packed values per lane. Field element complexity for VM-RandTuple generation is $O(n v/(n + 2k - 1))$ per server offline and $v (1 + 1/k)$ per server online (Zhang et al., 19 Jan 2026).

3. Filter Packing for Parallel Secure Convolution

PSS enables efficient packing of filters for secure convolutional neural network evaluation by grouping $k$ filters into a single PSS value for each spatial weight position. For a convolutional layer with $c_o$ filters (each of shape $(c_i \times f_w \times f_h)$ ), packing is performed so that each position $(j, u, v)$ has

$\tilde{w}_{j,u,v} = (w^{(1)}_{j,u,v}, \ldots, w^{(k)}_{j,u,v}),$

mapped into a degree- $(k-1)$ polynomial $W_{j,u,v}(X)$ . Input tensors are similarly packed. Convolution is then reduced to a packed inner product and an add-and-truncate operation, enabling simultaneous processing across all $k$ channels with negligible overhead over a single channel computation. Padding is efficiently handled by packing zeros, incurring no extra communication. The offline complexity per server for this operation is $O(u m \ell / k)$ field elements in 4 rounds, and the online complexity is $O(u m / k)$ field elements in one round, where $u \times v \times m$ is the post-unfolding matrix shape (Zhang et al., 19 Jan 2026).

4. Efficient Parallel Non-Linear Operations

All Boolean and bitwise operations can be performed in parallel across all $k$ packed values within a single PSS instance. For example, prefix-OR (critical to comparisons) is computed via a binary tree of DN-style multiplications in $O(\log \ell)$ rounds. Bitwise less-than is implemented by decomposition, XOR, and prefix-OR, with all $k$ comparisons done simultaneously inside one PSS in $O(\log \ell + 2)$ rounds. For DReLU/ReLU, the protocol masks $2x$ with random $r\in\mathbb{F}_p^k$ , evaluates one bitwise less-than, and corrects the result; DReLU is completed in $O(\log \ell + 5)$ rounds, and ReLU requires one additional multiplication round. Maxpool combines ReLU and pairwise comparisons in $O(\log m)\times O(\log \ell+6)$ rounds, all parallelized. Every invocation of a DN-style multiplication or degree transformation maintains $t$ -privacy inherent to PSS (Zhang et al., 19 Jan 2026).

5. Performance Metrics and Empirical Scalability

For packing factor $k$ , PSS realizes significant reductions in both communication and computational overhead across secure inference protocols.

Operation	Offline rounds/comm.	Online rounds/comm.
Vector-matrix multiplication	2 rounds, $\tfrac{(1+1/k)\, n v}{n+2k-1}$	1 round, $(1+1/k) uv$
Convolution	4 rounds, $O(u m \ell/k)$	1 round, $O(u m / k)$
ReLU	$O(\log \ell)$ online rounds	$O(k^{-1})$ communication/lane

Empirical evaluations (11–63 servers, $\ell=61$ , fixed-point 13 bits) indicate communication reductions compared to Shamir-only schemes of Liu et al. (USENIX’24) up to $5.85\times$ (offline), $11.17\times$ (online), $6.83\times$ (total) and speedups up to $1.59\times$ (offline), $2.61\times$ (online), $1.75\times$ (total) on wide-area networks. These improvements, especially for deeper architectures like VGG16 run with up to 63 servers, are due to the parallelization enabled by PSS. In local area networks, where computation dominates, offline speedups up to $3.76\times$ and total up to $2.61\times$ are observed on deep networks (Zhang et al., 19 Jan 2026).

6. Cryptographic and Practical Implications

PSS maintains the $(t, n)$ -threshold privacy and reconstruction guarantees per pack of $k$ values, with each operation—linear or nonlinear—executed in parallel over all lanes. This property enables throughput and scalability increases by roughly a factor of $k$ for both linear layers (matrix operations) and elementwise functions, with little connectivity or round overhead. The ability to parallelize across many secrets positions PSS as a practical primitive for secure, high-throughput computation in multi-party inference and cryptographic ML, overcoming the severe scalability and latency limitations of classical Shamir-based MPC protocols in network-constrained environments (Zhang et al., 19 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

High-Throughput and Scalable Secure Inference Protocols for Deep Learning with Packed Secret Sharing (2026)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Packed Shamir Secret Sharing (PSS).