Privacy-Preserving Partitioning

Updated 1 March 2026

Privacy-Preserving Partitioning is an approach that splits data and computations into smaller units to ensure rigorous privacy guarantees through methods like differential privacy and secure multiparty computation.
It finds applications in federated learning, cloud-edge inference, and data publishing by balancing task-specific utility, performance, and privacy under varied adversarial models.
Advanced algorithms optimize the trade-offs between latency, energy, and accuracy by dynamically selecting partitioning strategies, supporting scalable and adaptive real-world deployments.

Privacy-preserving partitioning refers to algorithmic and architectural strategies that structure data, computation, or both into smaller units—partitions—such that utility is preserved while adhering to rigorous privacy guarantees, typically differential privacy (DP) or cryptographic privacy. This concept underpins mechanisms in distributed and federated learning, privacy-preserving data publishing, collaborative inference, and large-scale coded computing. The approach balances performance, scalability, task-specific utility, and formal privacy under adversarial models ranging from honest-but-curious participants to explicit reconstruction attacks.

1. Fundamental Models and Mechanisms

Partitioning in privacy-preserving frameworks takes two principal forms: data partitioning (vertical, horizontal, or hybrid) and computational partitioning (task splitting across infrastructure, model-layer cuts, or encoding for coded computation). The privacy guarantees are attained using techniques such as DP noise injection, secure multiparty computation (MPC), or structural anonymization.

Data Partitioning: Data may be split by attributes (vertical), by records (horizontal), or in grid (hybrid) fashion. For example, in privacy-preserving decision tree induction, data can be distributed across multiple parties either by records or by disjoint attribute sets; protocols for ID3 with SMPC primitives are adapted accordingly (0803.1555). In vertically partitioned multiparty learning, the global model is expressed as a function of local and cross-party terms, and privacy is enforced via noise addition to polynomial coefficients at the party level with secure aggregation (Xu et al., 2019).
Computational Partitioning: Model splitting in collaborative inference (e.g., cloud-edge or split learning) enables intermediate representations to be selectively sanitized before offloading; strategic selection of split points optimizes between privacy leakage and system efficiency, as in CIS (Wang et al., 2022) and P3SL (Fan et al., 23 Jul 2025).

Underlying mechanisms include:

Differential Privacy (DP): Laplace or Gaussian noise is injected into statistics, feature maps, or gradients at defined partition boundaries to bound information leakage (Wang et al., 2022, Hadian et al., 2018, Badidi et al., 30 Aug 2025).
Secure Aggregation and Homomorphic Encryption: Used for aggregating statistics or gradient information across parties without exposing individual records or local models (Xu et al., 2019, Deng et al., 2020).
Anonymization via Partitioned Publishing: Data slicing and event log segmentation reduce sensitivity and enable parallel DP mechanisms, improving utility and scalability (0909.2290, Lim et al., 8 Jul 2025).
Coding with Hierarchical Partitioning: Coded computing leverages privacy masks and hierarchical task splits to ensure computational privacy and straggler robustness (Zeng et al., 2023).

2. Partitioning Strategies Across Domains

Privacy-preserving partitioning has been systematically explored in several problem domains:

Application	Partitioning Granularity	Privacy Mechanism
Federated/Distributed ML	Data (vertical/horizontal), Model layers	DP noise, Adversarial training, SMPC
Cloud-Edge/SL Inference	DNN layers, computation splits	DP on activations, Adaptive splits
Data Publishing/Anonymization	Attributes, Records, Event traces	Slicing, DP Laplace/Exponential
MPC/Coded Computing	Task blocks, Codewords	Privacy masks, MPC, Secret sharing

For example, in privacy-preserving federated learning on partitioned attributes, vertical partitioning leverages adversarial min-max training to produce intermediate representations that are selectively robust to inference attacks, with a forward-backward splitting optimizer to separate privacy and utility objectives (Zhang et al., 2021).

Event log partitioning in process mining pipelines facilitates utility-preserving anonymization: abstraction functions segment logs, which are then anonymized individually using DP mechanisms; the parallel composition property maintains global ε-DP (Lim et al., 8 Jul 2025).

3. Practical Algorithms and Optimization Criteria

Partitioning decisions frequently arise as solutions to explicit optimization problems, subject to performance and privacy constraints:

Cloud-Edge Inference (CIS): Minimize overall inference latency by adaptively selecting a split layer $m$ , subject to bandwidth, computation, and DP-induced noise trade-offs. The optimal split respects $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ (Wang et al., 2022).
Vehicular LLM Offloading: For each vehicle $v_i$ , determine the workload fraction $\beta_i$ processed locally to $\min_{0 \leq \beta_i \leq 1} w_1 T_\mathrm{loc}(\beta_i) + w_2 T_\mathrm{off}(\beta_i)$ , subject to a cumulative $\varepsilon$ -DP constraint (Badidi et al., 30 Aug 2025).
Layer Partitioning with TEEs: Identify cut index $k^*$ minimizing a weighted objective $\alpha\,\mathrm{Leakage}(k) + (1-\alpha)\,\mathrm{Latency}(k)$ ; privacy is quantified via SSIM under reconstruction attacks, and only layers 1… $k^*$ are executed within the enclave (Rajasekar et al., 2024).
Hierarchical Task Partitioning in APCC: Solve a mixed-integer nonlinear program to minimize task completion delay $z$ while guaranteeing privacy by ensuring decoding thresholds $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 0 (function of number of tasks $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 1 in set $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 2 and number of privacy masks $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 3) (Zeng et al., 2023).
Personalized Privacy-Preserving Split Learning (P3SL): Each client $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 4 performs a bi-level optimization, balancing individual privacy leakage (FSIM) and energy cost, under local power/accuracy constraints. Clients select split point $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 5 and noise parameter $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 6 for privacy injection (Fan et al., 23 Jul 2025).

Empirically, adaptation of split points under time-varying resources and privacy budget leads to significant utility gains—collaborative inference with adaptive partitioning achieves up to 13.6× latency speedup (CIS, $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 7 yields >82% task accuracy), while personalized split learning produces up to 59% energy reduction and strong resilience to membership-inference attacks (Wang et al., 2022, Fan et al., 23 Jul 2025).

4. Theoretical Guarantees and Privacy-Utility Trade-offs

Differential Privacy Guarantees: For all major DP-based partitioning, privacy is enforced at the user or record level. For instance, the per-channel Laplace mechanism in CIS achieves channel-wise noise level scaling by information rank; collaborative or per-feature budget allocation permits finer privacy-utility trade-offs (Wang et al., 2022).
Parallel Composition: When partitioning is along disjoint sub-logs, as in event abstraction for process discovery, the overall pipeline remains ε-DP by parallel composition of each partitioned DP mechanism (Lim et al., 8 Jul 2025).
Consistent Estimation Under Partitioning: In partition-and-censoring (PAC) frameworks for skewed data, mean-squared error converges at $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 8 with privacy perturbation contributing only $T_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j$ 9, ensuring practical estimation efficiency at moderate privacy budgets (Liu et al., 2023).
Partition Selection Utility: In private set union, adaptive rerouting of weight (MAD2R algorithm) admits stochastic dominance guarantees: for every item $v_i$ 0, the probability of selection under MAD is at least that under the basic algorithm and grows strictly for marginally frequent items. Scalability to $v_i$ 1 pairs demonstrated (Chen et al., 13 Feb 2025).
Optimization Under Constraints: Hierarchical partitioning in APCC is optimized with constraints balancing privacy, delay, and decoding load; increasing privacy budget (masks per set) necessitates smaller task sets, with an explicit encoding rate formula showing capacity-optimality (Zeng et al., 2023).

5. Security Models and Limitations

Privacy-preserving partitioning protocols are typically designed for the semi-honest adversarial model—participants follow protocol but may seek to infer additional information.

Security: In grid-partitioned ID3, secure sum, union, and intersection protocols, together with Yao-style circuits, constrain all intermediate information to aggregates or encrypted forms. Random masking and homomorphic encryption for transformation-based privacy-preserving linear programming ensure that no party learns others’ private shares or permutation matrices (0803.1555, Hong et al., 2016).
Limitations: Some residual structural leakage may persist: partitioning itself can discard inter-partition patterns, potentially impacting certain downstream analyses (e.g., event log abstraction) (Lim et al., 8 Jul 2025). For cryptography-based protocols, scalability can be limited by communication or polynomial growth in computation with the number of parties or tasks (Hong et al., 2016, Zeng et al., 2023).
Adaptive Partitioning Needs: Most current schemes adopt static partitioning; dynamic adaptation or instance-wise partitioning to optimize privacy and utility per input remains a prospective research direction (Rajasekar et al., 2024, Fan et al., 23 Jul 2025).

6. Empirical Performance and Design Guidelines

Algorithmic performance: Adaptive partitioning strategies such as CIS and P3SL have been empirically shown to maintain high accuracy (e.g., >82% on CIFAR-10 under moderate DP budgets or >90% global accuracy over heterogeneous devices), while robustly defending against both white-box and black-box reconstruction attacks (Wang et al., 2022, Fan et al., 23 Jul 2025).
Utility improvements: Partition-before-anonymization pipelines substantially improve model precision and utility in process discovery applications, especially when using directly-follows-based DP anonymization (Lim et al., 8 Jul 2025).
Parameter tuning: In practical deployments, partition size, privacy budget split, degree caps, and rerouting parameters must be tuned to the specific data/compute landscape to optimally balance privacy, utility, and scalability (Chen et al., 13 Feb 2025, Zeng et al., 2023).
Scalability: Parallel partitioning and MPC-based approaches enable scaling private computation to hundreds of billions of records or items, far exceeding sequential cryptographic baselines (Chen et al., 13 Feb 2025, Zeng et al., 2023).

7. Extensions and Future Research Directions

Emerging frontiers in privacy-preserving partitioning include:

Dynamic partitioning strategies and per-instance adaptation, improving privacy–utility–efficiency trade-offs in personalized and heterogeneous environments (Rajasekar et al., 2024, Fan et al., 23 Jul 2025).
Seamless integration with secure computation/MPC for vertical and grid partitions, especially in real-world networked data and health data scenarios (Deng et al., 2020, Xu et al., 2019).
Hybrid privacy models combining group-based anonymization (e.g., k-anonymity) with DP in partitioned pipelines to control for auxiliary-information attacks (0909.2290, Lim et al., 8 Jul 2025).
Optimization of partitioning criteria under full system constraints, including energy, communication, and exact user-specified privacy/utility budgets (Badidi et al., 30 Aug 2025, Zeng et al., 2023).

Advancements in these directions are expected to further bridge the gap between theoretically rigorous privacy guarantees and high-utility, scalable, and robust deployment in distributed data science and machine learning environments.