Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Six Levels of Privacy: A Framework for Financial Synthetic Data (2403.14724v1)

Published 20 Mar 2024 in cs.CR, cs.LG, and q-fin.ST

Abstract: Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of theSix Levels'' that include defenses against those attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8, 2020.
  2. Differentially private release of high-dimensional datasets using the gaussian copula, 2019.
  3. Auditing and generating synthetic data with controllable trust trade-offs. arXiv preprint arXiv:2304.10819, 2023.
  4. Towards principled assessment of tabular data synthesis algorithms. arXiv preprint arXiv:2402.06806, 2024.
  5. A unified framework for quantifying privacy risk in synthetic data, 2023.
  6. Generative adversarial networks, 2014.
  7. A framework for auditable synthetic data generation. arXiv preprint arXiv:2211.11540, 2022.
  8. Tapas: a toolbox for adversarial privacy auditing of synthetic data. In NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 2022.
  9. Differentially private synthetic data using KD-trees. In The 39th Conference on Uncertainty in Artificial Intelligence, 2023.
  10. Summary statistic privacy in data sharing, 2023.
  11. How to break anonymity of the netflix prize dataset, 2007.
  12. Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment, 11(10):1071–1083, jun 2018.
  13. The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 399–410, 2016.
  14. Membership inference attacks against machine learning models. CoRR, abs/1610.05820, 2016.
  15. Adversarial attacks against deep generative models on data: A survey. IEEE Transactions on Knowledge and Data Engineering, 35(4):3367–3388, apr 2023.
  16. Get real: Realism metrics for robust limit order book market simulations, 2019.
  17. Differentially private generative adversarial network, 2018.
  18. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tucker Balch (61 papers)
  2. Vamsi K. Potluru (28 papers)
  3. Deepak Paramanand (2 papers)
  4. Manuela Veloso (105 papers)