2000 character limit reached
FFPDG: Fast, Fair and Private Data Generation (2307.00161v1)
Published 30 Jun 2023 in cs.LG and cs.AI
Abstract: Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [\cite{goodfellow2014generative}] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.
- Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct 2016. doi: 10.1145/2976749.2978318. URL http://dx.doi.org/10.1145/2976749.2978318.
- Moment varieties of gaussian mixtures. Journal of Algebraic Statistics, 7(1), Jul 2016. ISSN 1309-3452. doi: 10.18409/jas.v7i1.42. URL http://dx.doi.org/10.18409/jas.v7i1.42.
- Generating synthetic data in finance: opportunities, challenges and pitfalls. Neural Information Processing Systems, 2019.
- Matias Barenstein. Propublica’s compas data revisited, 2019.
- S Hardtm N Barocas and A Narayanan. Fairness and Machine Learning. Nips tutorial, 2018.
- Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias, 2018.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings, 2016.
- L Breiman. Random forests. Machine Learning 45, 2001.
- Optimized data pre-processing for discrimination prevention, 2017.
- Data preprocessing to mitigate bias: A maximum entropy based approach. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 1349–1359. PMLR, 13–18 Jul 2020a. URL http://proceedings.mlr.press/v119/celis20a.html.
- Fair-max-entropy-distributions. https://https://github.com/vijaykeswani/Fair-Max-Entropy-Distributions, 2020b.
- Label-only membership inference attacks, 2021.
- On the compatibility of privacy and fairness. UMAP, pp. 309–315, 2019. URL https://doi.org/10.1145/3314183.3323847.
- Differential privacy for bayesian inference through posterior sampling. Journal of Machine Learning Research, 18(11):1–39, 2017. URL http://jmlr.org/papers/v18/15-257.html.
- Empirical risk minimization under fairness constraints, 2020.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Continual learning from synthetic data for a humanoid exercise robot, 2021.
- Fairness under composition. CoRR, abs/1806.06122, 2018. URL http://arxiv.org/abs/1806.06122.
- It’s not privacy, and it’s not fair. Stan. L. Rev. Online, 2013.
- The Algorithmic Foundations of Differential Privacy. Now Publishers Inc., 2014.
- Our data, ourselves: Privacy via distributed noise generation. Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503, 2006.
- Fairness through awareness, 2011a.
- Fairness through awareness. CoRR, abs/1104.3913, 2011b. URL http://arxiv.org/abs/1104.3913.
- Privacy for all: Ensuring fair and equitable privacy protections. In Conference on Fairness, Accountability and Transparency, pp. 35–47, 2018.
- Certifying and removing disparate impact. Association for Computing Machinery, 2015.
- Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Ann. Statist., 2001.
- Generative adversarial networks, 2014.
- Equality of opportunity in supervised learning, 2016.
- A moral framework for understanding of fair ml through economic models of equality of opportunity, 2018.
- Eldersim: A synthetic data generation platform for human action recognition in eldercare applications, 2020.
- Differentially private fair learning. CoRR, abs/1812.02696, 2018. URL http://arxiv.org/abs/1812.02696.
- Verifying individual fairness in machine learning models, 2020.
- Timnit Gebru Joy Buolamwini. Gender shades: Intersectional accuracy disparities in commercial gender classification. The 1st Conference on Fairness, Accountability and Transparency,, pp. 81:77–91, 2018.
- F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination, 2012.
- Copula flows for synthetic data generation, 2021.
- Average individual fairness: Algorithms, generalization and experiments, 2019.
- Auto-encoding variational bayes, 2014.
- Inherent trade-offs in the fair determination of risk scores, 2016.
- Fair decision making using privacy-protected data. CoRR, 2019.
- Differential privacy: A survey of results., 2008.
- Elizabeth Meckes. Projections of probability distributions: A measure-theoretic dvoretzky theorem. Proceedings on Privacy Enhancing Technologies, pp. 317–326, 2012.
- Human resource management system, 2013.
- Semi-supervised knowledge transfer for deep learning from private training data, 2017.
- Scalable private learning with pate, 2018.
- The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410, Oct 2016. doi: 10.1109/DSAA.2016.49.
- Fair bayesian optimization, 2021.
- J.Ross Quinlan. Induction of decision trees. Mach Learn, 1986.
- Dr. Rich, 2020. URL https://www.kaggle.com/rhuebner/human-resources-data-set.
- Irina Rish et al. An empirical study of the naive bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, number 22, pp. 41–46, 2001.
- Donald Rubin. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 1993.
- Jonathon Shlens. Notes on kullback-leibler divergence and likelihood, 2014.
- Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 3–18, 2017.
- Ron-gauss: Enhancing utility in non-interactive private data release. Proceedings on Privacy Enhancing Technologies, pp. 26–46, 2019.
- Numerical Linear Algebra. SIAM, 1997. ISBN 0898713617.
- Learning non-discriminatory predictors, 2017.
- Differentially private generative adversarial network, 2018.
- Synthesizing tabular data using generative adversarial networks, 2018.
- Modeling tabular data using conditional GAN. CoRR, abs/1907.00503, 2019. URL http://arxiv.org/abs/1907.00503.
- PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1zk9iRqF7.
- Fairness constraints: Mechanisms for fair classification, 2017.
- Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 20(75):1–42, 2019. URL http://jmlr.org/papers/v20/18-262.html.