Distribution-Agnostic Database De-Anonymization Under Obfuscation And Synchronization Errors
Abstract: Database de-anonymization typically involves matching an anonymized database with correlated publicly available data. Existing research focuses either on practical aspects without requiring knowledge of the data distribution yet provides limited guarantees, or on theoretical aspects assuming known distributions. This paper aims to bridge these two approaches, offering theoretical guarantees for database de-anonymization under synchronization errors and obfuscation without prior knowledge of data distribution. Using a modified replica detection algorithm and a new seeded deletion detection algorithm, we establish sufficient conditions on the database growth rate for successful matching, demonstrating a double-logarithmic seed size relative to row size is sufficient for detecting deletions in the database. Importantly, our findings indicate that these sufficient de-anonymization conditions are tight and are the same as in the distribution-aware setting, avoiding asymptotic performance loss due to unknown distributions. Finally, we evaluate the performance of our proposed algorithms through simulations, confirming their effectiveness in more practical, non-asymptotic, scenarios.
- P. Ohm, “Broken promises of privacy: Responding to the surprising failure of anonymization,” UCLA L. Rev., vol. 57, p. 1701, 2009.
- F. M. Naini, J. Unnikrishnan, P. Thiran, and M. Vetterli, “Where you are is who you are: User identification by matching statistics,” IEEE Trans. Inf. Forensics Security, vol. 11, no. 2, pp. 358–372, 2016.
- A. Datta, D. Sharma, and A. Sinha, “Provable de-anonymization of large datasets with sparse dimensions,” in International Conference on Principles of Security and Trust. Springer, 2012, pp. 229–248.
- A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in Proc. of IEEE Symposium on Security and Privacy, 2008, pp. 111–125.
- L. Sweeney, “Weaving technology and policy together to maintain confidentiality,” The Journal of Law, Medicine & Ethics, vol. 25, no. 2-3, pp. 98–110, 1997.
- N. Takbiri, A. Houmansadr, D. L. Goeckel, and H. Pishro-Nik, “Matching anonymized and obfuscated time series to users’ profiles,” IEEE Transactions on Information Theory, vol. 65, no. 2, pp. 724–741, 2019.
- D. Cullina, P. Mittal, and N. Kiyavash, “Fundamental limits of database alignment,” in Proc. of IEEE International Symposium on Information Theory (ISIT), 2018, pp. 651–655.
- F. Shirani, S. Garg, and E. Erkip, “A concentration of measure approach to database de-anonymization,” in Proc. of IEEE International Symposium on Information Theory (ISIT), 2019, pp. 2748–2752.
- O. E. Dai, D. Cullina, and N. Kiyavash, “Database alignment with Gaussian features,” in The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 3225–3233.
- D. Kunisky and J. Niles-Weed, “Strong recovery of geometric planted matchings,” in Proc. of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 2022, pp. 834–876.
- R. Tamir, “On correlation detection of Gaussian databases via local decision making,” in 2023 IEEE International Symposium on Information Theory (ISIT). IEEE, 2023, pp. 1231–1236.
- S. Bakirtas and E. Erkip, “Database matching under column deletions,” in Proc. of IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2720–2725.
- ——, “Matching of Markov databases under random column repetitions,” in 2022 56th Asilomar Conference on Signals, Systems, and Computers, 2022.
- ——, “Seeded database matching under noisy column repetitions,” in 2022 IEEE Information Theory Workshop (ITW). IEEE, 2022, pp. 386–391.
- ——, “Database matching under noisy synchronization errors,” arXiv preprint arXiv:2301.06796, 2023.
- ——, “Database matching under adversarial column deletions,” in 2023 IEEE Information Theory Workshop (ITW). IEEE, 2023, pp. 181–185.
- F. Shirani, S. Garg, and E. Erkip, “A concentration of measure approach to correlated graph matching,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 1, pp. 338–351, 2021.
- G. Morvai and B. Weiss, “Order estimation of Markov chains,” IEEE Transactions on Information Theory, vol. 51, no. 4, pp. 1496–1497, 2005.
- W. Blischke, “Moment estimators for the parameters of a mixture of two Binomial distributions,” The Annals of Mathematical Statistics, pp. 444–454, 1962.
- T. W. Anderson and L. A. Goodman, “Statistical inference about Markov chains,” The Annals of Mathematical Statistics, pp. 89–110, 1957.
- N. Balakrishnan and A. Stepanov, “Asymptotic properties of the ratio of order statistics,” Statistics & probability letters, vol. 78, no. 3, pp. 301–310, 2008.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.