Papers
Topics
Authors
Recent
Search
2000 character limit reached

DP-RuL: Privacy-preserving Rule Learning

Updated 2 June 2026
  • Privacy-preserving Rule Learning (DP-RuL) is a set of techniques for creating interpretable, rule-based predictive models with rigorous differential privacy guarantees.
  • It employs both distributed (LDP) and centralized approaches—using methods like Monte-Carlo Tree Search and smooth sensitivity—to extract and aggregate valid rules from sensitive data.
  • The framework minimizes individual data leakage while maintaining high utility, demonstrating robust performance in applications such as clinical decision support and credit scoring.

Privacy-preserving Rule Learning (DP-RuL) denotes a class of techniques aimed at constructing interpretable, rule-based predictive models while providing rigorous differential privacy guarantees. These approaches are distinguished by protocols that allow the extraction of statistically robust population-level rulesets from private, potentially sensitive datasets or from distributed local rulesets, with a focus on minimizing the information leaked about any individual data contributor. Two principal instantiations have emerged: one leveraging local differential privacy (LDP) for distributed rule learning from private clients, and another based on differentially-private rule induction using smooth sensitivity of Gini impurity in centralized settings (Lamp et al., 2024, Ly et al., 2024).

1. Formalization and Differential Privacy Foundations

DP-RuL formalizes privacy-preserving rule learning for scenarios where individual or distributed data sources require strong privacy protection. For the LDP setting, the protocol involves nn clients, each deriving a personalized ruleset RiGR_i \subset \mathcal{G}—where G\mathcal{G} is a logic-based rule grammar—from private data. The objective is to aggregate these into a single server-side population ruleset RSR_S representing valid, generalizable rules, while ensuring that protocol Π\Pi satisfies ε\varepsilon-local differential privacy:

x,x,o:Pr[A(x)=o]Pr[A(x)=o]eε\forall x, x',\,\forall o: \qquad \frac{\Pr[\mathcal{A}(x)=o]}{\Pr[\mathcal{A}(x')=o]} \le e^{\varepsilon}

For classic centralized settings (e.g., single database), (ε,δ)(\varepsilon, \delta)-differential privacy is enforced for the rule-learning algorithm MM:

Pr[M(D)S]eεPr[M(D)S]+δ\Pr[M(D) \in S] \le e^\varepsilon \Pr[M(D') \in S] + \delta

where RiGR_i \subset \mathcal{G}0 and RiGR_i \subset \mathcal{G}1 are neighboring datasets differing in one record (Lamp et al., 2024, Ly et al., 2024).

DP-RuL protocols employ expressive grammars to represent candidate rules. In distributed clinical decision support, the grammar RiGR_i \subset \mathcal{G}2 is typically formulated as Signal Temporal Logic (STL):

RiGR_i \subset \mathcal{G}3

where RiGR_i \subset \mathcal{G}4 is propositional, RiGR_i \subset \mathcal{G}5 denotes continuous-valued signals, and temporal operators (always, eventually, until) parameterize rule expressiveness. Candidate rules are explored using "partial-rule templates"—STL formulas with "holes" (placeholders)—supporting systematic expansion during search. A rule RiGR_i \subset \mathcal{G}6 is labeled valid if at least a fraction RiGR_i \subset \mathcal{G}7 of clients locally support it, i.e., RiGR_i \subset \mathcal{G}8.

3. Algorithms for Privacy-Preserving Rule Induction

3.1 Monte-Carlo Tree Search with Local Differential Privacy (Distributed, LDP Setting)

Structured search is conducted via Monte-Carlo Tree Search (MCTS) over the grammar space, constructing an exploration tree RiGR_i \subset \mathcal{G}9 whose nodes are partial-rule templates. At each iteration, the protocol executes:

  • Selection: Traverse the tree selecting child nodes according to an Upper Confidence Bound (UCT)-style score, considering the estimated support G\mathcal{G}0 for a given candidate.
  • Expansion: Grow the tree by expanding "holes" in the selected template, yielding more specific candidate rules.
  • Querying (Simulation): For each candidate, randomized-response queries are issued to clients to estimate support, with each client applying the randomized mechanism:

G\mathcal{G}1

Clients respond truthfully with probability G\mathcal{G}2 to structural-match queries and with Laplace noise to parameter estimation queries.

  • Backpropagation: Update exploration tree statistics based on observed (noisy) responses.

Adaptive privacy budget allocation dynamically determines the budget G\mathcal{G}3 for each query, targeting the minimal value needed so the probability of incorrectly pruning a valid subtree is G\mathcal{G}4. This leverages a binomial model of the response distribution and numerical search for G\mathcal{G}5 (Lamp et al., 2024).

3.2 Greedy Rule List Induction with Smooth Sensitivity (Centralized, Standard DP Setting)

An alternative DP-RuL approach constructs rule lists greedily using a Gini impurity–based information gain criterion. The global sensitivity for Gini gain is bounded by G\mathcal{G}6, but tighter privacy-utility trade-offs are achieved by analyzing smooth sensitivity:

G\mathcal{G}7

with G\mathcal{G}8. At each step, the algorithm applies the Laplace mechanism with scale calibrated to G\mathcal{G}9, selecting the candidate rule with maximal noisy utility, and applies pure-DP Laplace mechanisms to class-counts for predictive labeling (Ly et al., 2024).

4. Privacy Guarantees and Budget Composition

Both LDP and standard-DP variants of DP-RuL achieve rigorous privacy guarantees via composition:

  • Local Differential Privacy: The total privacy loss is the sum of per-query privacy parameters used in randomized-response and parameter queries, with total consumption not exceeding RSR_S0 due to sequential composition.
  • Centralized DP (Smooth Sensitivity): The total privacy budget RSR_S1 is apportioned per node in the rule list. The algorithm ensures that at most RSR_S2 sequential rule choices and associated queries do not cumulatively exceed the user-specified privacy bounds. All data-independent post-processing preserves these guarantees.

5. Experimental Evaluation and Results

DP-RuL protocols demonstrate favorable privacy-utility trade-offs across diverse datasets:

Setting Dataset Examples Mechanisms Notable Results
LDP, Distributed ICU, Sepsis, T1D (Lamp et al., 2024) MCTS+Randomized Response 70–85% rule coverage at RSR_S3, >90% precision, with clinical utility (F₁, balanced accuracy) within 5–10% of non-private baseline for RSR_S4
Centralized, DP German Credit, COMPAS, Adult [240
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Privacy-preserving Rule Learning (DP-RuL).