Sensitive-Information Repartitioning
- Sensitive-information repartitioning is defined as dividing confidential data into secure partitions or shares using techniques like secret-sharing and hypergraph models to prevent unauthorized disclosure.
- It employs methods such as Shamir’s threshold scheme, query binning, and privacy-aware clustering to balance security and utility in distributed and outsourced environments.
- Applications include cloud computing, secure data outsourcing, and privacy-preserving analytics, providing enhanced defense against adversarial attacks.
Sensitive-Information Repartitioning refers to the set of methodologies, protocols, and theoretical frameworks by which sensitive data is divided, clustered, or distributed into distinct partitions or shares, so as to achieve particular privacy, security, and utility objectives under explicit adversarial and operational models. These approaches are motivated by the need to minimize risk exposure, enforce confidentiality, and enable robust, secure operations and analytics in distributed, multi-agent, and outsourced environments where traditional monolithic protection strategies are inadequate.
1. Foundations and Formal Models
Sensitive-information repartitioning is formally rooted in secret-sharing theory, partitioned security frameworks, privacy-aware clustering, and privacy-preserving data outsourcing. The canonical formulation is the -threshold secret sharing construct, where a secret is divided into shares such that any coalition of at least parties can reconstruct , but coalitions of size possess no information about (Bu et al., 2019, Rauh, 2017). In privacy-aware partitioning, sensitive properties are modeled as hypergraph dependencies: the disclosure of a property requires all elements in a minimal subset of atomic data. Denying any adversary a complete achieves nondisclosure (Rekatsinas et al., 2013).
Partitioned security can also be described via probabilistic criteria: partitioned-data security demands that joint processing of encrypted sensitive data and clear non-sensitive data does not increase the adversary’s posterior probabilities of sensitive association or frequency orderings beyond their priors (Mehrotra et al., 2018, Mehrotra et al., 2020).
2. Cryptographic Partitioning: Secret Sharing and Robustness
Threshold Secret Sharing (TSS), often implemented as Shamir’s scheme over a finite field , is central to repartitioning in cryptographic contexts. Given devices, a -threshold polynomial is constructed such that each device receives a share tied to its public ID . The secret is embedded as the highest-order polynomial coefficient; Lagrange interpolation over any shares recovers . Shamir-TSS provides perfect secrecy: any subset of shares provides zero mutual information about the secret (Bu et al., 2019).
For robust deployment, additional integrity and authenticity mechanisms are layered atop TSS:
- Encrypt-then-MAC: the secret is encrypted and authenticated before sharing, preventing plaintext exposure even with or more compromised shares.
- Tamper detection: MAC verification identifies cheaters; Reed-Solomon decoding and adaptive group testing regimes find and correct dishonest shares up to , ensuring resilience against collusion and tampering.
- Authenticity is guaranteed via MAC types such as HMAC-SHA2, with collision probability ; physical unclonable function (PUF) frameworks can provision keys per-client for authentication isolation.
Empirical runtime and storage evaluations show the scheme remains lightweight and resource-adaptive: under benign conditions, only MAC verification is needed; more costly decoding/group-testing modules are invoked only on detected tampering (Bu et al., 2019).
3. Partitioning in Data Outsourcing and Cloud Environments
Sensitive-information repartitioning underpins efficient and secure outsourced computing in hybrid and public-cloud settings. Data is pre-classified into sensitive () and non-sensitive () partitions by owner policy, e.g., via field-level or record-level tags (Mehrotra et al., 2018, Mehrotra et al., 2018, Mehrotra et al., 2020). The architectural split is:
- Hybrid-cloud: private cloud stores including all and only pseudo-sensitive needed for joins; public cloud stores as the remainder of . Split queries are executed independently, and only merged at the private side, minimizing risk of linkages.
- Public-cloud: is encrypted (e.g., via FHE, SGX, or secret-sharing), is stored and indexed in plaintext. Join and selection queries are repartitioned, and an additional layer (such as Query Binning) ensures that access patterns, sizes, and frequencies of returned data do not leak associations between and .
Partitioned-data security is formally required: the adversary’s ability to link encrypted sensitive values to cleartext non-sensitive values or deduce frequency orderings does not increase post-query.
Empirical results indicate that hybrid and binning-based partitioned computation can cut query runtimes and private storage requirements by up to 30–40% compared to “encrypt-everything” baselines, with security strengthened against size, frequency, and workload-skew attacks (Mehrotra et al., 2018).
4. Query Binning Protocols and Leakage Defense
Query Binning (QB) is a protocol that achieves partitioned-data security in outsourced query settings. The owner precomputes bins (sensitive) and (non-sensitive), arranging each as balanced groups. For any selection query , the algorithm:
- Identifies bins containing .
- Issues subqueries for the entire bins (not just ) over both and .
- Ensures that every returned result set is of fixed size, and each is always hidden among a group.
This strategy neutralizes size and frequency-count attacks: adversaries learn only the bin—but not the queried value—while repeated queries never refine associations due to the complete bipartite matching structure (Mehrotra et al., 2018, Mehrotra et al., 2020). For joins and range queries, multi-dimensional or hierarchical binning extends the protection.
Practically, QB delivers performance gains up to – over pure encrypted search, especially when the sensitive fraction is moderate (50%), and poses minimal additional cryptographic or bandwidth overhead (Mehrotra et al., 2020).
5. Non-Collusion and Privacy-Aware Partitioning Across Adversaries
Sensitive-information repartitioning is foundational in settings involving multiple non-colluding adversaries, such as advertising, crowdsourcing, and federated computations. The SPARSI framework models these scenarios using sensitive properties (minimal data subsets required for property disclosure) depicted as hyperedges in a dependency hypergraph (Rekatsinas et al., 2013).
The principal objective is to maximize utility (e.g., advertiser value) while constraining information disclosure:
- Formally, partitioning is cast as a (bi)criterion optimization: maximize (utility) subject to (disclosure budget), with NP-hardness arising due to the hypergraph coloring analog. The Lagrangian relaxation balances utility and disclosure.
- Step-function, linear, and quadratic disclosure measures are considered, with approximate or local-search algorithms (GRASP) available for tractable solution finding. Randomized rounding is employed for submodular utilities.
- The partitioning approach guarantees zero-disclosure (no adversary sees enough data to infer any property) on real data (e.g., social network check-ins), with empirical evaluations showing near-optimal utility and robust leakage prevention.
This approach is strictly predicated on the absence of collusion among adversaries; if collusion is possible, secret sharing and threshold schemes must be employed.
6. Partitioned Storage and Recovery in Secure Systems
Sensitive-information repartitioning directly benefits resilience in encrypted storage systems, such as end-to-end encrypted online social networks (Schillinger et al., 2020). Partitioned storages are constructed by:
- Splitting user storage into multiple compartments, each encrypted under independent symmetric keys.
- Layering threshold sharing (Shamir-based) schemes hierarchically: within compartments, across per-room keys, and globally among all peers.
- Distributing shares to active peers using public-key encryption and signing, and requiring out-of-band confirmation before shares are released for recovery.
This approach enables successful reconstruction rates of even under high rates of peer inactivity or maliciousness, incurs marginal storage and communication overheads, and provides integrity and tamper-detection via chained hashing and structured signature schemes.
7. Advanced Topics: Output-Sensitive Analysis and Differential Privacy
Sensitive-information repartitioning further extends to program analysis and privacy-preserving analytics:
- Output-sensitive information flow partitions variables into input, output, and leakage sets, then applies type-based analysis to statically ensure that leakage channels do not reveal more than the public output (Ene et al., 2019). A program typed with (input, output, leakage) OSNI is provably secure with respect to designated leakage bounds.
- Differentially private clustering (e.g., DPM (Liebenow et al., 2023)) recursively separates sensitive data in a Mondrian-style manner using geometric splitting and exponential mechanism-based selection. This approach achieves -DP, supports privacy-preserving hyperparameter estimation, preserves clustering fidelity (within $1$– of non-private baselines), and obviates the need to specify the number of clusters.
These methodologies provide design patterns for repurposing sensitive information in complex privacy-preserving analytic pipelines and programmatic environments.
Sensitive-information repartitioning encompasses a comprehensive set of cryptographic, algorithmic, and architectural approaches by which confidential information is split, distributed, and processed in a manner rigorously designed to control leakage, maintain robustness, and optimize utility under explicit threat models and practical operating constraints. Its theoretical provenance extends from threshold secret-sharing to privacy-aware hypergraph partitioning and differential privacy, while its applied relevance spans IoT, cloud analytics, social networks, multi-agent systems, and secure data outsourcing.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free