Papers
Topics
Authors
Recent
Search
2000 character limit reached

Privacy-Preserving Partitioning

Updated 1 March 2026
  • Privacy-Preserving Partitioning is an approach that splits data and computations into smaller units to ensure rigorous privacy guarantees through methods like differential privacy and secure multiparty computation.
  • It finds applications in federated learning, cloud-edge inference, and data publishing by balancing task-specific utility, performance, and privacy under varied adversarial models.
  • Advanced algorithms optimize the trade-offs between latency, energy, and accuracy by dynamically selecting partitioning strategies, supporting scalable and adaptive real-world deployments.

Privacy-Preserving Partitioning

Privacy-preserving partitioning refers to algorithmic and architectural strategies that structure data, computation, or both into smaller units—partitions—such that utility is preserved while adhering to rigorous privacy guarantees, typically differential privacy (DP) or cryptographic privacy. This concept underpins mechanisms in distributed and federated learning, privacy-preserving data publishing, collaborative inference, and large-scale coded computing. The approach balances performance, scalability, task-specific utility, and formal privacy under adversarial models ranging from honest-but-curious participants to explicit reconstruction attacks.

1. Fundamental Models and Mechanisms

Partitioning in privacy-preserving frameworks takes two principal forms: data partitioning (vertical, horizontal, or hybrid) and computational partitioning (task splitting across infrastructure, model-layer cuts, or encoding for coded computation). The privacy guarantees are attained using techniques such as DP noise injection, secure multiparty computation (MPC), or structural anonymization.

  • Data Partitioning: Data may be split by attributes (vertical), by records (horizontal), or in grid (hybrid) fashion. For example, in privacy-preserving decision tree induction, data can be distributed across multiple parties either by records or by disjoint attribute sets; protocols for ID3 with SMPC primitives are adapted accordingly (0803.1555). In vertically partitioned multiparty learning, the global model is expressed as a function of local and cross-party terms, and privacy is enforced via noise addition to polynomial coefficients at the party level with secure aggregation (Xu et al., 2019).
  • Computational Partitioning: Model splitting in collaborative inference (e.g., cloud-edge or split learning) enables intermediate representations to be selectively sanitized before offloading; strategic selection of split points optimizes between privacy leakage and system efficiency, as in CIS (Wang et al., 2022) and P3SL (Fan et al., 23 Jul 2025).

Underlying mechanisms include:

2. Partitioning Strategies Across Domains

Privacy-preserving partitioning has been systematically explored in several problem domains:

Application Partitioning Granularity Privacy Mechanism
Federated/Distributed ML Data (vertical/horizontal), Model layers DP noise, Adversarial training, SMPC
Cloud-Edge/SL Inference DNN layers, computation splits DP on activations, Adaptive splits
Data Publishing/Anonymization Attributes, Records, Event traces Slicing, DP Laplace/Exponential
MPC/Coded Computing Task blocks, Codewords Privacy masks, MPC, Secret sharing

For example, in privacy-preserving federated learning on partitioned attributes, vertical partitioning leverages adversarial min-max training to produce intermediate representations that are selectively robust to inference attacks, with a forward-backward splitting optimizer to separate privacy and utility objectives (Zhang et al., 2021).

Event log partitioning in process mining pipelines facilitates utility-preserving anonymization: abstraction functions segment logs, which are then anonymized individually using DP mechanisms; the parallel composition property maintains global ε-DP (Lim et al., 8 Jul 2025).

3. Practical Algorithms and Optimization Criteria

Partitioning decisions frequently arise as solutions to explicit optimization problems, subject to performance and privacy constraints:

  • Cloud-Edge Inference (CIS): Minimize overall inference latency by adaptively selecting a split layer mm, subject to bandwidth, computation, and DP-induced noise trade-offs. The optimal split respects Ttotal(m)=Tupt(m)+Tdownt(m)+i=1mtie+j=m+1ntjcT_{\mathrm{total}}(m)=T^t_{up}(m)+T^t_{down}(m)+\sum_{i=1}^m t^e_i + \sum_{j=m+1}^n t^c_j (Wang et al., 2022).
  • Vehicular LLM Offloading: For each vehicle viv_i, determine the workload fraction βi\beta_i processed locally to min0βi1w1Tloc(βi)+w2Toff(βi)\min_{0 \leq \beta_i \leq 1} w_1 T_\mathrm{loc}(\beta_i) + w_2 T_\mathrm{off}(\beta_i), subject to a cumulative ε\varepsilon-DP constraint (Badidi et al., 30 Aug 2025).
  • Layer Partitioning with TEEs: Identify cut index kk^* minimizing a weighted objective αLeakage(k)+(1α)Latency(k)\alpha\,\mathrm{Leakage}(k) + (1-\alpha)\,\mathrm{Latency}(k); privacy is quantified via SSIM under reconstruction attacks, and only layers 1…kk^* are executed within the enclave (Rajasekar et al., 2024).
  • Hierarchical Task Partitioning in APCC: Solve a mixed-integer nonlinear program to minimize task completion delay zz while guaranteeing privacy by ensuring decoding thresholds HiH_i (function of number of tasks KiK_i in set ii and number of privacy masks LL) (Zeng et al., 2023).
  • Personalized Privacy-Preserving Split Learning (P3SL): Each client ii performs a bi-level optimization, balancing individual privacy leakage (FSIM) and energy cost, under local power/accuracy constraints. Clients select split point sis_i and noise parameter σi\sigma_i for privacy injection (Fan et al., 23 Jul 2025).

Empirically, adaptation of split points under time-varying resources and privacy budget leads to significant utility gains—collaborative inference with adaptive partitioning achieves up to 13.6× latency speedup (CIS, ε=1030\varepsilon=10-30 yields >82% task accuracy), while personalized split learning produces up to 59% energy reduction and strong resilience to membership-inference attacks (Wang et al., 2022, Fan et al., 23 Jul 2025).

4. Theoretical Guarantees and Privacy-Utility Trade-offs

  • Differential Privacy Guarantees: For all major DP-based partitioning, privacy is enforced at the user or record level. For instance, the per-channel Laplace mechanism in CIS achieves channel-wise noise level scaling by information rank; collaborative or per-feature budget allocation permits finer privacy-utility trade-offs (Wang et al., 2022).
  • Parallel Composition: When partitioning is along disjoint sub-logs, as in event abstraction for process discovery, the overall pipeline remains ε-DP by parallel composition of each partitioned DP mechanism (Lim et al., 8 Jul 2025).
  • Consistent Estimation Under Partitioning: In partition-and-censoring (PAC) frameworks for skewed data, mean-squared error converges at O(P1)O(P^{-1}) with privacy perturbation contributing only O(P3/2ϵ1)O(P^{-3/2} \epsilon^{-1}), ensuring practical estimation efficiency at moderate privacy budgets (Liu et al., 2023).
  • Partition Selection Utility: In private set union, adaptive rerouting of weight (MAD2R algorithm) admits stochastic dominance guarantees: for every item ii, the probability of selection under MAD is at least that under the basic algorithm and grows strictly for marginally frequent items. Scalability to 101110^{11} pairs demonstrated (Chen et al., 13 Feb 2025).
  • Optimization Under Constraints: Hierarchical partitioning in APCC is optimized with constraints balancing privacy, delay, and decoding load; increasing privacy budget (masks per set) necessitates smaller task sets, with an explicit encoding rate formula showing capacity-optimality (Zeng et al., 2023).

5. Security Models and Limitations

Privacy-preserving partitioning protocols are typically designed for the semi-honest adversarial model—participants follow protocol but may seek to infer additional information.

  • Security: In grid-partitioned ID3, secure sum, union, and intersection protocols, together with Yao-style circuits, constrain all intermediate information to aggregates or encrypted forms. Random masking and homomorphic encryption for transformation-based privacy-preserving linear programming ensure that no party learns others’ private shares or permutation matrices (0803.1555, Hong et al., 2016).
  • Limitations: Some residual structural leakage may persist: partitioning itself can discard inter-partition patterns, potentially impacting certain downstream analyses (e.g., event log abstraction) (Lim et al., 8 Jul 2025). For cryptography-based protocols, scalability can be limited by communication or polynomial growth in computation with the number of parties or tasks (Hong et al., 2016, Zeng et al., 2023).
  • Adaptive Partitioning Needs: Most current schemes adopt static partitioning; dynamic adaptation or instance-wise partitioning to optimize privacy and utility per input remains a prospective research direction (Rajasekar et al., 2024, Fan et al., 23 Jul 2025).

6. Empirical Performance and Design Guidelines

  • Algorithmic performance: Adaptive partitioning strategies such as CIS and P3SL have been empirically shown to maintain high accuracy (e.g., >82% on CIFAR-10 under moderate DP budgets or >90% global accuracy over heterogeneous devices), while robustly defending against both white-box and black-box reconstruction attacks (Wang et al., 2022, Fan et al., 23 Jul 2025).
  • Utility improvements: Partition-before-anonymization pipelines substantially improve model precision and utility in process discovery applications, especially when using directly-follows-based DP anonymization (Lim et al., 8 Jul 2025).
  • Parameter tuning: In practical deployments, partition size, privacy budget split, degree caps, and rerouting parameters must be tuned to the specific data/compute landscape to optimally balance privacy, utility, and scalability (Chen et al., 13 Feb 2025, Zeng et al., 2023).
  • Scalability: Parallel partitioning and MPC-based approaches enable scaling private computation to hundreds of billions of records or items, far exceeding sequential cryptographic baselines (Chen et al., 13 Feb 2025, Zeng et al., 2023).

7. Extensions and Future Research Directions

Emerging frontiers in privacy-preserving partitioning include:

  • Dynamic partitioning strategies and per-instance adaptation, improving privacy–utility–efficiency trade-offs in personalized and heterogeneous environments (Rajasekar et al., 2024, Fan et al., 23 Jul 2025).
  • Seamless integration with secure computation/MPC for vertical and grid partitions, especially in real-world networked data and health data scenarios (Deng et al., 2020, Xu et al., 2019).
  • Hybrid privacy models combining group-based anonymization (e.g., k-anonymity) with DP in partitioned pipelines to control for auxiliary-information attacks (0909.2290, Lim et al., 8 Jul 2025).
  • Optimization of partitioning criteria under full system constraints, including energy, communication, and exact user-specified privacy/utility budgets (Badidi et al., 30 Aug 2025, Zeng et al., 2023).

Advancements in these directions are expected to further bridge the gap between theoretically rigorous privacy guarantees and high-utility, scalable, and robust deployment in distributed data science and machine learning environments.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Privacy-Preserving Partitioning.