Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel (1901.06629v2)

Published 20 Jan 2019 in cs.IT and math.IT

Abstract: For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data $X$ correlated with the sensitive information $S$, the PF problem is to generate the sanitized data $\hat{X}$ that maintains a specified utility/fidelity threshold on $I(X; \hat{X})$ while minimizing the privacy leakage $I(S; \hat{X})$. Our IAC-MDSF algorithm starts with the original alphabet $\hat{\mathcal{X}} := \mathcal{X}$ and iteratively merges the elements in the current alphabet $\hat{\mathcal{X}}$ that minimizes the Lagrangian function $ I(S;\hat{X}) - \lambda I(X;\hat{X}) $. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\hat{\mathcal{X}}$ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier $\lambda$, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $ I(S;\hat{X})$ vs. $- I(X;\hat{X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.

Citations (6)

Summary

We haven't generated a summary for this paper yet.