DP-RuL: Privacy-preserving Rule Learning

Updated 2 June 2026

Privacy-preserving Rule Learning (DP-RuL) is a set of techniques for creating interpretable, rule-based predictive models with rigorous differential privacy guarantees.
It employs both distributed (LDP) and centralized approaches—using methods like Monte-Carlo Tree Search and smooth sensitivity—to extract and aggregate valid rules from sensitive data.
The framework minimizes individual data leakage while maintaining high utility, demonstrating robust performance in applications such as clinical decision support and credit scoring.

Privacy-preserving Rule Learning (DP-RuL) denotes a class of techniques aimed at constructing interpretable, rule-based predictive models while providing rigorous differential privacy guarantees. These approaches are distinguished by protocols that allow the extraction of statistically robust population-level rulesets from private, potentially sensitive datasets or from distributed local rulesets, with a focus on minimizing the information leaked about any individual data contributor. Two principal instantiations have emerged: one leveraging local differential privacy (LDP) for distributed rule learning from private clients, and another based on differentially-private rule induction using smooth sensitivity of Gini impurity in centralized settings (Lamp et al., 2024, Ly et al., 2024).

1. Formalization and Differential Privacy Foundations

DP-RuL formalizes privacy-preserving rule learning for scenarios where individual or distributed data sources require strong privacy protection. For the LDP setting, the protocol involves $n$ clients, each deriving a personalized ruleset $R_i \subset \mathcal{G}$ —where $\mathcal{G}$ is a logic-based rule grammar—from private data. The objective is to aggregate these into a single server-side population ruleset $R_S$ representing valid, generalizable rules, while ensuring that protocol $\Pi$ satisfies $\varepsilon$ -local differential privacy:

$\forall x, x',\,\forall o: \qquad \frac{\Pr[\mathcal{A}(x)=o]}{\Pr[\mathcal{A}(x')=o]} \le e^{\varepsilon}$

For classic centralized settings (e.g., single database), $(\varepsilon, \delta)$ -differential privacy is enforced for the rule-learning algorithm $M$ :

$\Pr[M(D) \in S] \le e^\varepsilon \Pr[M(D') \in S] + \delta$

where $R_i \subset \mathcal{G}$ 0 and $R_i \subset \mathcal{G}$ 1 are neighboring datasets differing in one record (Lamp et al., 2024, Ly et al., 2024).

2. Rule Grammar Representation and Search

DP-RuL protocols employ expressive grammars to represent candidate rules. In distributed clinical decision support, the grammar $R_i \subset \mathcal{G}$ 2 is typically formulated as Signal Temporal Logic (STL):

$R_i \subset \mathcal{G}$ 3

where $R_i \subset \mathcal{G}$ 4 is propositional, $R_i \subset \mathcal{G}$ 5 denotes continuous-valued signals, and temporal operators (always, eventually, until) parameterize rule expressiveness. Candidate rules are explored using "partial-rule templates"—STL formulas with "holes" (placeholders)—supporting systematic expansion during search. A rule $R_i \subset \mathcal{G}$ 6 is labeled valid if at least a fraction $R_i \subset \mathcal{G}$ 7 of clients locally support it, i.e., $R_i \subset \mathcal{G}$ 8.

3. Algorithms for Privacy-Preserving Rule Induction

3.1 Monte-Carlo Tree Search with Local Differential Privacy (Distributed, LDP Setting)

Structured search is conducted via Monte-Carlo Tree Search (MCTS) over the grammar space, constructing an exploration tree $R_i \subset \mathcal{G}$ 9 whose nodes are partial-rule templates. At each iteration, the protocol executes:

Selection: Traverse the tree selecting child nodes according to an Upper Confidence Bound (UCT)-style score, considering the estimated support $\mathcal{G}$ 0 for a given candidate.
Expansion: Grow the tree by expanding "holes" in the selected template, yielding more specific candidate rules.
Querying (Simulation): For each candidate, randomized-response queries are issued to clients to estimate support, with each client applying the randomized mechanism:

$\mathcal{G}$ 1

Clients respond truthfully with probability $\mathcal{G}$ 2 to structural-match queries and with Laplace noise to parameter estimation queries.

Backpropagation: Update exploration tree statistics based on observed (noisy) responses.

Adaptive privacy budget allocation dynamically determines the budget $\mathcal{G}$ 3 for each query, targeting the minimal value needed so the probability of incorrectly pruning a valid subtree is $\mathcal{G}$ 4. This leverages a binomial model of the response distribution and numerical search for $\mathcal{G}$ 5 (Lamp et al., 2024).

3.2 Greedy Rule List Induction with Smooth Sensitivity (Centralized, Standard DP Setting)

An alternative DP-RuL approach constructs rule lists greedily using a Gini impurity–based information gain criterion. The global sensitivity for Gini gain is bounded by $\mathcal{G}$ 6, but tighter privacy-utility trade-offs are achieved by analyzing smooth sensitivity:

$\mathcal{G}$ 7

with $\mathcal{G}$ 8. At each step, the algorithm applies the Laplace mechanism with scale calibrated to $\mathcal{G}$ 9, selecting the candidate rule with maximal noisy utility, and applies pure-DP Laplace mechanisms to class-counts for predictive labeling (Ly et al., 2024).

4. Privacy Guarantees and Budget Composition

Both LDP and standard-DP variants of DP-RuL achieve rigorous privacy guarantees via composition:

Local Differential Privacy: The total privacy loss is the sum of per-query privacy parameters used in randomized-response and parameter queries, with total consumption not exceeding $R_S$ 0 due to sequential composition.
Centralized DP (Smooth Sensitivity): The total privacy budget $R_S$ 1 is apportioned per node in the rule list. The algorithm ensures that at most $R_S$ 2 sequential rule choices and associated queries do not cumulatively exceed the user-specified privacy bounds. All data-independent post-processing preserves these guarantees.

5. Experimental Evaluation and Results

DP-RuL protocols demonstrate favorable privacy-utility trade-offs across diverse datasets:

Setting	Dataset Examples	Mechanisms	Notable Results
LDP, Distributed	ICU, Sepsis, T1D (Lamp et al., 2024)	MCTS+Randomized Response	70–85% rule coverage at $R_S$ 3, >90% precision, with clinical utility (F₁, balanced accuracy) within 5–10% of non-private baseline for $R_S$ 4
Centralized, DP	German Credit, COMPAS, Adult [240

Markdown Report Issue Upgrade to Chat

References (2)

DP-RuL: Differentially-Private Rule Learning for Clinical Decision Support Systems (2024)

Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Privacy-preserving Rule Learning (DP-RuL).