PDMM: Private Distributed Matrix Multiplication
- PDMM is the study of secure protocols and coding schemes for outsourcing matrix multiplication while protecting the privacy of input matrices.
- It leverages advanced partitioning and polynomial coding strategies to guarantee correctness, privacy, and efficient recovery even with colluding or straggling servers.
- PDMM impacts secure machine learning and privacy-preserving cloud analytics by managing trade-offs between computation cost, communication, and fault tolerance.
Private Distributed Matrix Multiplication (PDMM) is the paper of protocols and coding-theoretic schemes for securely outsourcing matrix multiplication computations to distributed and potentially untrusted servers, while protecting the privacy of the input matrices and allowing for computational efficiency and scalability. PDMM encompasses various adversary models (colluding honest-but-curious, Byzantine), partitioning and encoding methodologies (polynomial, bivariate, rateless, quantum), and performance trade-offs (download/upload cost, straggler tolerance, computation complexity), and has direct impact on secure large-scale machine learning, federated trust systems, and privacy-preserving cloud analytics.
1. System Models and Security Definitions
The core PDMM model assumes an owner with two private input matrices, and , who seeks to compute by distributing encrypted or coded shares of and to remote servers or workers. Privacy, security, and correctness requirements depend on the threat model:
- -Privacy: Up to colluding servers (honest-but-curious) must gain zero information about or from their full view—formally, $I(A,B; \text{all coded parts seen by any set of$T$servers})=0$ (Hofmeister et al., 21 Jan 2025, Yu et al., 2020).
- Correctness: The user must reconstruct with zero error from a sufficient subset of worker responses.
- Collusion and Index Privacy: For scenarios where the server holds a library, the index of the desired matrix (e.g., ) must remain private to all servers (Chang et al., 2019, Li et al., 2021).
Partitioning strategies include outer product partitioning (OPP) (Hofmeister et al., 21 Jan 2025), block partitioning (e.g., into horizontal and vertical blocks), or more general bilinear/tensor decompositions (Yu et al., 2020). Matrix shares are encoded using tailored polynomial-based schemes.
2. Polynomial Coding Constructions
PDMM primarily employs polynomial code frameworks to guarantee both privacy and decoding. Key code classes include:
- GASP and Generalized Polynomial Codes: Use specifically designed degree tables, augmenting message polynomials for and with random “masking” terms, so that any worker evaluations yield no information on the plaintext inputs (Hofmeister et al., 21 Jan 2025, Nomeir et al., 28 Nov 2025, Yu et al., 2020, D'Oliveira et al., 2020).
- Degree Table Framework: For OPP, encoding polynomials , are evaluated at a set of points, with exponents chosen (possibly modulo a cyclic group in CAT) so that all subproducts appear at unique polynomial degrees, and the residual degrees are filled by mask-terms spanning the kernel of all -collusion (Hofmeister et al., 21 Jan 2025).
Construction Table (examples):
| Code Scheme | Privacy Threshold | Workers Needed | Description |
|---|---|---|---|
| GASP / GASPrs / DOGrs | , , | Gap-additive, decodable integer degree tables | |
| CAT (Cyclic-Addition Table) | Uses roots of unity, modulo degree tables | ||
| Bivariate Polynomial Codes | Code in both , supports streaming and straggler mitigation |
The minimum number of workers for privacy and decodability depends on the degree-table design, with modern codes (CAT, DOGrs) achieving nontrivial improvements over classic GASP in the low-privacy regime () (Hofmeister et al., 21 Jan 2025).
Decoding is via polynomial interpolation over a finite field; the required recovery threshold is the degree of the product polynomial plus 1.
3. Communication and Computational Complexity
PDMM schemes present intricate trade-offs between communication cost, computation cost, privacy, and resilience to stragglers or faulty workers. Asymptotic scaling depends on the partition parameters and code construction:
- Upload cost: Total transmitted coded symbols per worker, proportional to block sizes and code parameters (Hofmeister et al., 21 Jan 2025, Hasircioglu et al., 2021).
- Download cost: Dominated by the number of worker responses needed () and the size of the matrix subproducts per response (often ).
- Computation cost: User-side encoding and decoding involve fast multipoint polynomial evaluation/interpolation; server cost is dominated by a single block-level matrix multiplication per response (D'Oliveira et al., 2020).
- Straggler Mitigation: Schemes such as bivariate polynomial codes (Hasircioglu et al., 2021, Hasircioglu et al., 2021) and rateless codes (Bitar et al., 2021, Bitar et al., 2020) enable efficient exploitation of partial worker contributions, dynamically accommodating heterogeneous and slow workers.
Summary Table (model-dependent):
| Code | Recovery Threshold () | Upload/Download Cost | Straggler Tolerance/Adaptivity |
|---|---|---|---|
| GASP | Fixed threshold | ||
| CAT | Lower at low- regime | Fixed threshold | |
| Bivariate / MM | Lower upload; every result useful | Multi-message, streaming | |
| Rateless (RPM3) | Small, fixed-size tasks, rate-adaptive | High heterogeneity support |
Sources: (Hofmeister et al., 21 Jan 2025, Hasircioglu et al., 2021, Hasircioglu et al., 2021, Bitar et al., 2021, Bitar et al., 2020, D'Oliveira et al., 2020).
Optimized polynomial codes can yield total computation time as low as , strictly below the local cost of classic fast matrix multiplication in the presence of secure offloading (D'Oliveira et al., 2020).
4. Extensions: Quantum, Sparse, and Private Retrieval Settings
Quantum PDMM
- Exploits shared entanglement among servers and quantum communication, potentially doubling the download rate (super-dense coding when feasible) (Nomeir et al., 28 Nov 2025).
- Feasibility limited by the “longest consecutive interference block” in the code’s degree table; in high-privacy regimes, quantum codes achieve .
- Extensions to cases where GASP is infeasible yield code families with explicit exponents achieving near-optimal quantum/classical rate gaps.
- For low- (), quantum codes sometimes achieve up to advantage (Nomeir et al., 28 Nov 2025).
Private/Index-Private Settings
- When the user’s query index must remain hidden (matrix chosen from a library), hybrid secret-sharing and private information retrieval (PIR) constructions achieve optimal trade-offs between upload and download costs under information-theoretic security (Chang et al., 2019, Li et al., 2021, Zhu et al., 2022).
- MDS-coded storage generalizations enable reduced per-server storage for index-private retrieval (Zhu et al., 2022).
Sparse PDMM
- For sparse matrix workloads, recent secret-sharing codes enable adjustable trade-offs between share sparsity and privacy, maintaining thresholds and tolerating up to stragglers with negligible privacy degradation when (Egger et al., 2023).
5. Straggler, Malicious, and Adaptive Protocols
State-of-the-art PDMM protocols feature robust straggler and Byzantine tolerance:
- Bivariate polynomial codes efficiently stream partial products and allow “one-to-any” replaceability of sub-results, reducing average latency and upload requirements (Hasircioglu et al., 2021, Hasircioglu et al., 2021).
- Rateless and adaptive clustering schemes (e.g., RPM3, SRPM3) dynamically assign tasks and recluster workers by current speeds, tolerating arbitrary heterogeneity and providing theoretical guarantees on mean completion time and rate (Bitar et al., 2021, Bitar et al., 2020, Hofmeister et al., 2021).
- Byzantine/malicious tolerance: By layering probabilistic verification (e.g., Freivalds’ check) over rate-adaptive codes, PDMM can detect and isolate arbitrarily many faulty workers with high probability, surpassing classical deterministic MDS error-correction limits (Hofmeister et al., 2021).
6. Comparative Performance and Known Limits
Recent advances deliver significant performance enhancements over classic PDMM constructions:
- Newer polynomial-based codes (CAT, DOGrs) yield up to a $3T-5$ saving in worker count at low privacy, and asymptotic improvement over GASP in the medium-privacy regime (Hofmeister et al., 21 Jan 2025).
- Entangled polynomial codes break the “cubic barrier” for PDMM, reducing the recovery threshold far below the number of block subproducts required by classical partitioning (Yu et al., 2020).
- The information-theoretic converse is known in some regimes: for instance, for secure-index retrieval, the lower convex hull of the relevant upload/download pairs is tight (Chang et al., 2019).
- The field-size requirement for code decodability and privacy becomes nontrivial as or grows, but remains practical for most parameter settings (Hofmeister et al., 21 Jan 2025, D'Oliveira et al., 2020).
Open questions include the existence of better partitioning than OPP, absolute lower bounds for the worker count in arbitrary regimes, and the development of hybrid classical–quantum PDMM protocols overcoming current feasibility constraints.
7. Applications and Ongoing Directions
PDMM enables privacy-preserving offloading of linear algebra in various latency- and privacy-critical settings, such as:
- Large-scale distributed machine learning with privacy constraints.
- Secure and privacy-preserving trust evaluation in decentralized networks, using established monoidal trust aggregation methods (Dumas et al., 2016).
- Fully private retrieval and computation over coded data libraries (as in secure and private learning over MDS-coded storage) (Zhu et al., 2022).
Emerging directions include quantum-enhanced PDMM, optimized code designs for sparse and adaptive computation, information-theoretic characterization of capacity under quantum and classical resources, and the synthesis of robust, efficient protocols for adversarially heterogeneous and malicious environments (Nomeir et al., 28 Nov 2025, Egger et al., 2023, Hofmeister et al., 2021).