Papers
Topics
Authors
Recent
Search
2000 character limit reached

Packed Shamir Secret Sharing (PSS)

Updated 26 January 2026
  • Packed Shamir Secret Sharing is a method that encodes a vector of secrets into a single polynomial, enabling parallel sharing while preserving threshold privacy.
  • It optimizes secure multi-party computation by reducing communication overhead and increasing throughput, notably for deep neural network inference.
  • The scheme employs VM-RandTuple structures and filter packing techniques to support efficient vector–matrix multiplication and convolution with maintained t-privacy.

Packed Shamir Secret Sharing (PSS) is a generalization of Shamir’s (t,n)(t, n)-threshold secret sharing scheme, enabling the encoding of a vector of kk secrets into a single polynomial of degree dk1d \geq k-1 over a finite field. This structure permits parallel, or "packed," computation over multiple secret values with communication and round complexity closely matching that for sharing a single value. PSS is particularly designed to enhance throughput and scalability in secure multi-party computation (MPC), especially for deep neural network inference in honest-majority settings, by reducing the otherwise prohibitive communication overhead and enabling high degrees of parallelism (Zhang et al., 19 Jan 2026).

1. Formal Definition and Construction

PSS operates over the field Fp\mathbb{F}_p, where p=21p=2^\ell-1 is a Mersenne prime (with typical choices {31,61}\ell\in\{31,61\}), optimizing arithmetic efficiency. For n=2d+1n=2d+1 servers and packing factor kdk\le d, the threshold is set to t=dk+1t=d-k+1 for privacy. A vector of kk secrets s0,,sk1Fps_0, \dots, s_{k-1}\in\mathbb{F}_p is packed as the coefficients of a degree-(k1)(k-1) polynomial:

f(X)=j=0k1sjXjmodp.f(X) = \sum_{j=0}^{k-1} s_j X^j \bmod p.

Each server ii is assigned a publicly known, pairwise-distinct point xiFpx_i\in \mathbb{F}_p and receives a share f(xi)f(x_i). Any d+1d+1 shares suffice to reconstruct the entire vector by Lagrange interpolation, whereas any tt or fewer reveal nothing. Extraction of the jjth secret utilizes the Lagrange coefficients:

sj=i=1nf(xi)λi,j,s_j = \sum_{i=1}^{n} f(x_i)\, \lambda_{i,j},

where λi,j\lambda_{i,j} are determined by the interpolation basis at the corresponding evaluation points. This construction provides parallel privacy and reconstruction for the kk secrets at the cost of a single polynomial evaluation per server, achieving packing efficiency while maintaining (t,n)(t,n) threshold security properties (Zhang et al., 19 Jan 2026).

2. VM-RandTuple Structures for Vector–Matrix Multiplication

Efficient secure vector–matrix multiplication in the MPC context leverages Vector-Matrix Multiplication–Friendly Random Share Tuples (VM-RandTuples). A VM-RandTuple is a pair (r2d,rd)(\llbracket r \rrbracket_{2d}, \llbracket r' \rrbracket_d), where rr is a packed vector in Fpkv\mathbb{F}_p^{kv} and ri=j=ik(i+1)k1rjr'_i = \sum_{j=ik}^{(i+1)k-1} r_j produces a packed sum for each output coordinate. The protocol generates these tuples offline using a two-round Vandermonde-matrix method, with each server secret-sharing k2k^2 random values and exchanging linear combinations via the transposed Vandermonde matrices. Privacy for up to tt colluding servers is preserved, as each PSS instance ensures information-theoretic secrecy for packs of kk secrets.

This structure allows vector–matrix or matrix–matrix products to be performed in parallel across all kk packed values per lane. Field element complexity for VM-RandTuple generation is O(nv/(n+2k1))O(n v/(n + 2k - 1)) per server offline and v(1+1/k)v (1 + 1/k) per server online (Zhang et al., 19 Jan 2026).

3. Filter Packing for Parallel Secure Convolution

PSS enables efficient packing of filters for secure convolutional neural network evaluation by grouping kk filters into a single PSS value for each spatial weight position. For a convolutional layer with coc_o filters (each of shape (ci×fw×fh)(c_i \times f_w \times f_h)), packing is performed so that each position (j,u,v)(j, u, v) has

w~j,u,v=(wj,u,v(1),,wj,u,v(k)),\tilde{w}_{j,u,v} = (w^{(1)}_{j,u,v}, \ldots, w^{(k)}_{j,u,v}),

mapped into a degree-(k1)(k-1) polynomial Wj,u,v(X)W_{j,u,v}(X). Input tensors are similarly packed. Convolution is then reduced to a packed inner product and an add-and-truncate operation, enabling simultaneous processing across all kk channels with negligible overhead over a single channel computation. Padding is efficiently handled by packing zeros, incurring no extra communication. The offline complexity per server for this operation is O(um/k)O(u m \ell / k) field elements in 4 rounds, and the online complexity is O(um/k)O(u m / k) field elements in one round, where u×v×mu \times v \times m is the post-unfolding matrix shape (Zhang et al., 19 Jan 2026).

4. Efficient Parallel Non-Linear Operations

All Boolean and bitwise operations can be performed in parallel across all kk packed values within a single PSS instance. For example, prefix-OR (critical to comparisons) is computed via a binary tree of DN-style multiplications in O(log)O(\log \ell) rounds. Bitwise less-than is implemented by decomposition, XOR, and prefix-OR, with all kk comparisons done simultaneously inside one PSS in O(log+2)O(\log \ell + 2) rounds. For DReLU/ReLU, the protocol masks $2x$ with random rFpkr\in\mathbb{F}_p^k, evaluates one bitwise less-than, and corrects the result; DReLU is completed in O(log+5)O(\log \ell + 5) rounds, and ReLU requires one additional multiplication round. Maxpool combines ReLU and pairwise comparisons in O(logm)×O(log+6)O(\log m)\times O(\log \ell+6) rounds, all parallelized. Every invocation of a DN-style multiplication or degree transformation maintains tt-privacy inherent to PSS (Zhang et al., 19 Jan 2026).

5. Performance Metrics and Empirical Scalability

For packing factor kk, PSS realizes significant reductions in both communication and computational overhead across secure inference protocols.

Operation Offline rounds/comm. Online rounds/comm.
Vector-matrix multiplication 2 rounds, (1+1/k)nvn+2k1\tfrac{(1+1/k)\, n v}{n+2k-1} 1 round, (1+1/k)uv(1+1/k) uv
Convolution 4 rounds, O(um/k)O(u m \ell/k) 1 round, O(um/k)O(u m / k)
ReLU O(log)O(\log \ell) online rounds O(k1)O(k^{-1}) communication/lane

Empirical evaluations (11–63 servers, =61\ell=61, fixed-point 13 bits) indicate communication reductions compared to Shamir-only schemes of Liu et al. (USENIX’24) up to 5.85×5.85\times (offline), 11.17×11.17\times (online), 6.83×6.83\times (total) and speedups up to 1.59×1.59\times (offline), 2.61×2.61\times (online), 1.75×1.75\times (total) on wide-area networks. These improvements, especially for deeper architectures like VGG16 run with up to 63 servers, are due to the parallelization enabled by PSS. In local area networks, where computation dominates, offline speedups up to 3.76×3.76\times and total up to 2.61×2.61\times are observed on deep networks (Zhang et al., 19 Jan 2026).

6. Cryptographic and Practical Implications

PSS maintains the (t,n)(t, n)-threshold privacy and reconstruction guarantees per pack of kk values, with each operation—linear or nonlinear—executed in parallel over all lanes. This property enables throughput and scalability increases by roughly a factor of kk for both linear layers (matrix operations) and elementwise functions, with little connectivity or round overhead. The ability to parallelize across many secrets positions PSS as a practical primitive for secure, high-throughput computation in multi-party inference and cryptographic ML, overcoming the severe scalability and latency limitations of classical Shamir-based MPC protocols in network-constrained environments (Zhang et al., 19 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Packed Shamir Secret Sharing (PSS).