Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Classification Utility, Fairness, and Compactness via Tunable Information Bottleneck and Rényi Measures (2206.10043v3)

Published 20 Jun 2022 in cs.LG, cs.IT, and math.IT

Abstract: Designing machine learning algorithms that are accurate yet fair, not discriminating based on any sensitive attribute, is of paramount importance for society to accept AI for critical applications. In this article, we propose a novel fair representation learning method termed the R\'enyi Fair Information Bottleneck Method (RFIB) which incorporates constraints for utility, fairness, and compactness (compression) of representation, and apply it to image and tabular data classification. A key attribute of our approach is that we consider - in contrast to most prior work - both demographic parity and equalized odds as fairness constraints, allowing for a more nuanced satisfaction of both criteria. Leveraging a variational approach, we show that our objectives yield a loss function involving classical Information Bottleneck (IB) measures and establish an upper bound in terms of two R\'enyi measures of order $\alpha$ on the mutual information IB term measuring compactness between the input and its encoded embedding. We study the influence of the $\alpha$ parameter as well as two other tunable IB parameters on achieving utility/fairness trade-off goals, and show that the $\alpha$ parameter gives an additional degree of freedom that can be used to control the compactness of the representation. Experimenting on three different image datasets (EyePACS, CelebA, and FairFace) and two tabular datasets (Adult and COMPAS), using both binary and categorical sensitive attributes, we show that on various utility, fairness, and compound utility/fairness metrics RFIB outperforms current state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (95)
  1. A. Gronowski, W. Paul, F. Alajaji, B. Gharesifard, and P. Burlina, “Rényi fair information bottleneck for image classification,” in Proc. 17th Canadian Workshop on Information Theory, June 2022, pp. 1 – 5. [Online]. Available: http://cwit.ca/2022/papers/1570796577.pdf
  2. N. M. Kinyanjui, T. Odonga, C. Cintas, N. C. Codella, R. Panda, P. Sattigeri, and K. R. Varshney, “Fairness of classifiers across skin tones in dermatology,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2020, pp. 320–329.
  3. T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, “Man is to computer programmer as woman is to homemaker? Debiasing word embeddings,” in Proc. Adv. Neural Inform. Process. Syst., 2016, pp. 4349–4357.
  4. D. Friolo, F. Massacci, C. N. Ngo, and D. Venturi, “Cryptographic and financial fairness,” IEEE Transactions on Information Forensics and Security, 2022.
  5. R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork, “Learning fair representations,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 325–333.
  6. F. Prost, H. Qian, Q. Chen, E. H. Chi, J. Chen, and A. Beutel, “Toward a better trade-off between performance and fairness with kernel-based distribution matching,” arXiv:1910.11779, 2019.
  7. S. Caton and C. Haas, “Fairness in machine learning: A survey,” arXiv:2010.04053, 2020.
  8. V. Albiero, K. Zhang, M. C. King, and K. W. Bowyer, “Gendered differences in face recognition accuracy explained by hairstyles, makeup, and facial morphology,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 127–137, 2021.
  9. M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” in Proc. Adv. Neural Inform. Process. Syst., 2016, pp. 3315–3323.
  10. H. Zhao, A. Coston, T. Adel, and G. J. Gordon, “Conditional learning of fair representations,” in Proc. Int. Conf. Learn. Represent., 2020.
  11. N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” in Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, 1999, p. 368–377.
  12. A. Rényi, “On measures of entropy and information,” in Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1961, pp. 547–561.
  13. F. J. Valverde-Albacete and C. Peláez-Moreno, “The case for shifting the Rényi entropy,” Entropy, vol. 21, pp. 1–21, 2019.
  14. H. Bhatia, W. Paul, F. Alajaji, B. Gharesifard, and P. Burlina, “Least k𝑘kitalic_kth-order and Rényi generative adversarial networks,” Neural Computation, vol. 33, no. 9, pp. 2473–2510, 2021.
  15. F. C. Thierrin, F. Alajaji, and T. Linder, “Rényi cross-entropy measures for common distributions and processes with memory,” Entropy, vol. 24, no. 10, pp. 1–9, 2022.
  16. G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger, “On fairness and calibration,” in Proc. Adv. Neural Inform. Process. Syst., 2017, pp. 5680–5689.
  17. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 4401–4410.
  18. A. Grover, J. Song, A. Kapoor, K. Tran, A. Agarwal, E. J. Horvitz, and S. Ermon, “Bias correction of learned generative models using likelihood-free importance weighting,” in Proc. Adv. Neural Inform. Process. Syst., 2019, pp. 11 058–11 070.
  19. W. Paul, A. Hadzic, N. Joshi, F. Alajaji, and P. Burlina, “TARA: Training and representation alteration for AI fairness and domain generalization,” Neural Computation, vol. 34, no. 3, pp. 716–753, 2022.
  20. S. Hwang, S. Park, D. Kim, M. Do, and H. Byun, “Fairfacegan: Fairness-aware facial image-to-image translation,” in Proc. Brit. Mach. Vis. Conf., 2020.
  21. P. Sattigeri, S. C. Hoffman, V. Chenthamarakshan, and K. R. Varshney, “Fairness GAN: Generating datasets with fairness properties using a generative adversarial network,” IBM Journal of Research and Development, vol. 63, no. 4/5, pp. 3:1–3:9, 2019.
  22. N. Quadrianto, V. Sharmanska, and O. Thomas, “Discovering fair representations in the data domain,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 8227–8236.
  23. C. Wadsworth, F. Vera, and C. Piech, “Achieving fairness through adversarial learning: an application to recidivism prediction,” in Conference on Fairness, Accountability, and Transparency in Machine Learning (FATML), 2018.
  24. B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 335–340.
  25. H. Edwards and A. Storkey, “Censoring representations with an adversary,” in Proc. Int. Conf. Learn. Represent., 2016.
  26. A. Beutel, J. Chen, Z. Zhao, and E. H. Chi, “Data decisions and theoretical implications when adversarially learning fair representations,” arXiv:1707.00075, 2017.
  27. D. Madras, E. Creager, T. Pitassi, and R. Zemel, “Learning adversarially fair and transferable representations,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 3384–3393.
  28. P. C. Roy and V. N. Boddeti, “Mitigating information leakage in image representations: A maximum entropy approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 2586–2594.
  29. S. Pfohl, B. Marafino, A. Coulet, F. Rodriguez, L. Palaniappan, and N. H. Shah, “Creating fair models of atherosclerotic cardiovascular disease risk,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 271–278.
  30. X. Gitiaux and H. Rangwala, “Learning smooth and fair representations,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 253–261.
  31. V. Grari, S. Lamprier, and M. Detyniecki, “Fairness-aware neural Rényi minimization for continuous features,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, ser. IJCAI’20, 2021.
  32. J. Song, P. Kalluri, A. Grover, S. Zhao, and S. Ermon, “Learning controllable fair representations,” in Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019, pp. 2164–2173.
  33. P. Kairouz, J. Liao, C. Huang, M. Vyas, M. Welfert, and L. Sankar, “Generating fair universal representations using adversarial models,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 1970–1985, 2022.
  34. A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, “Nonlinear information bottleneck,” Entropy, vol. 21, no. 12, p. 1181, 2019.
  35. I. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020.
  36. A. Makhdoumi, S. Salamatian, N. Fawaz, and M. Médard, “From the information bottleneck to the privacy funnel,” in 2014 IEEE Information Theory Workshop (ITW 2014), 2014, pp. 501–505.
  37. D. Strouse and D. J. Schwab, “The deterministic information bottleneck,” Neural computation, vol. 29, no. 6, pp. 1611–1630, 2017.
  38. H. Hsu, S. Asoodeh, S. Salamatian, and F. P. Calmon, “Generalizing bottleneck problems,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2018, pp. 531–535.
  39. S. Asoodeh and F. P. Calmon, “Bottleneck problems: An information and estimation-theoretic view,” Entropy, vol. 22, no. 11, p. 1325, 2020.
  40. J.-J. Weng, F. Alajaji, and T. Linder, “An information bottleneck problem with Rényi’s entropy,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2489–2494.
  41. Z. Goldfeld and Y. Polyanskiy, “The information bottleneck problem and its applications in machine learning,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 19–38, 2020.
  42. A. Zaidi, I. Estella-Aguerri, and S. Shamai (Shitz), “On the information bottleneck problems: Models, connections, applications and information theoretic views,” Entropy, vol. 22, no. 2, p. 151, 2020.
  43. A. Ghassami, S. Khodadadian, and N. Kiyavash, “Fairness in supervised learning: An information theoretic approach,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2018, pp. 176–180.
  44. B. Rodríguez-Gálvez, R. Thobaben, and M. Skoglund, “A variational approach to privacy and fairness,” in Proceedings of the IEEE Information Theory Workshop (ITW), 2021, pp. 1–6.
  45. A. R. Esposito, M. Gastpar, and I. Issa, “Robust generalization via α𝛼\alphaitalic_α-mutual information,” in Proceedings of the International Zurich Seminar on Information and Communication, 2020, pp. 96–100.
  46. Y. Li and R. E. Turner, “Rényi divergence variational inference,” in Proc. Adv. Neural Inform. Process. Syst., vol. 29, 2016, pp. 1073–1081.
  47. I. Mironov, “Rényi differential privacy,” in IEEE 30th Computer Security Foundations Symposium (CSF), 2017, pp. 263–275.
  48. A. Sarraf and Y. Nie, “RGAN: Rényi generative adversarial network,” SN Computer Science, vol. 2, no. 1, p. 17, 2021.
  49. Y. Pantazis, D. Paul, M. Fasoulakis, Y. Stylianou, and M. A. Katsoulakis, “Cumulant GAN,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  50. G. R. Kurri, T. Sypherd, and L. Sankar, “Realizing GANs via a tunable loss function,” in Proceedings of the IEEE Information Theory Workshop (ITW), 2021, pp. 1–6.
  51. G. R. Kurri, M. Welfert, T. Sypherd, and L. Sankar, “α𝛼\alphaitalic_α-GAN: Convergence and estimation guarantees,” in Proceedings of the IEEE International Symposium on Information Theory (ISIT), 2022, pp. 312–317.
  52. M. Welfert, K. Otstot, G. R. Kurri, and L. Sankar, “(αD,αG)subscript𝛼𝐷subscript𝛼𝐺(\alpha_{D},\alpha_{G})( italic_α start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT )-GANs: Addressing GAN training instabilities via dual objectives,” arXiv preprint arXiv:2302.14320, 2023.
  53. S. Baharlouei, M. Nouiehed, A. Beirami, and M. Razaviyayn, “Rényi fair inference,” in Proc. Int. Conf. Learn. Represent., 2020.
  54. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine Learning, vol. 79, no. 1, pp. 151–175, 2010.
  55. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2096–2030, 2016.
  56. M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 97–105.
  57. N. Zhang, M. Mohri, and J. Hoffman, “Multiple-source adaptation theory and algorithms,” Annals of Mathematics and Artificial Intelligence, vol. 89, no. 3, pp. 237–270, 2021.
  58. S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang, “Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization,” in Proc. Int. Conf. Mach. Learn., 2020.
  59. N. Sohoni, J. Dunnmon, G. Angus, A. Gu, and C. Ré, “No subclass left behind: Fine-grained robustness in coarse-grained classification problems,” Proc. Adv. Neural Inform. Process. Syst., vol. 33, pp. 19 339–19 352, 2020.
  60. T. Zhang, T. Zhu, K. Gao, W. Zhou, and S. Y. Philip, “Balancing learning model privacy, fairness, and accuracy with early stopping criteria,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  61. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in Proc. Int. Conf. Learn. Represent., 2017, pp. 1–17.
  62. F. du Pin Calmon and N. Fawaz, “Privacy against statistical inference,” in Proc. 50th Annual Allerton Conf. Commun., Cont. Comput., 2012, pp. 1401–1408.
  63. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in Proc. Int. Conf. Learn. Represent., 2014.
  64. T. van Erwen and P. Harremos, “Rényi divergence and Kullback-Leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797 – 3820, 2014.
  65. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proc. Adv. Neural Inform. Process. Syst., vol. 27, 2014.
  66. M. Gil, F. Alajaji, and T. Linder, “Rényi divergence measures for commonly used univariate continuous distributions,” Information Sciences, vol. 249, pp. 124–131, 2013.
  67. J. Burbea, “The convexity with respect to Gaussian distributions of divergences of order α𝛼\alphaitalic_α,” Utilitas Mathematica, vol. 26, pp. 171–192, 1984.
  68. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proc. Int. Conf. Comput. Vis., December 2015.
  69. M. Wilkes, C. Y. Wright, J. L. du Plessis, and A. Reeder, “Fitzpatrick skin type, individual typology angle, and melanin index in an african population: steps toward universally applicable skin photosensitivity assessments,” JAMA dermatology, vol. 151, no. 8, pp. 902–903, 2015.
  70. M. Merler, N. Ratha, R. S. Feris, and J. R. Smith, “Diversity in faces,” arXiv:1901.10436, 2019.
  71. N. M. Kinyanjui, T. Odonga, C. Cintas, N. C. Codella, R. Panda, P. Sattigeri, and K. R. Varshney, “Estimating skin tone and effects on classification performance in dermatology datasets,” in NeurIPS 2019 Workshop on Fair ML for Health, 2019.
  72. K. Karkkainen and J. Joo, “Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1548–1558.
  73. EyePACS, “Diabetic retinopathy detection,” Data retrieved from Kaggle, https://www.kaggle.com/agaldran/eyepacs, 2015.
  74. P. Burlina, N. Joshi, W. Paul, K. D. Pacheco, and N. M. Bressler, “Addressing artificial intelligence bias in retinal disease diagnostics,” Translational Vision Science and Technology, 2020.
  75. D. Dua and C. Graff, “UCI machine learning repository,” Online; http://archive.ics.uci.edu/ml, 2017.
  76. W. Dieterich, C. Mendoza, and T. Brennan, “Compas risk scales: Demonstrating accuracy equity and predictive parity,” Northpointe Inc, 2016.
  77. C. Mougan, J. M. Alvarez, G. K. Patro, S. Ruggieri, and S. Staab, “Fairness implications of encoding protected categorical attributes,” arXiv preprint arXiv:2201.11358, 2022.
  78. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  79. J. Cho, G. Hwang, and C. Suh, “A fair classifier using mutual information,” in Proceedings IEEE International Symposium on Information Theory (ISIT), 2020, pp. 2521–2526.
  80. Y. Roh, K. Lee, S. Whang, and C. Suh, “Fr-Train: A mutual information-based approach to fair and robust training,” in Proc. Int. Conf. Machine Learning, 2020, pp. 8147–8157.
  81. W. Paul, P. Mathew, F. Alajaji, and P. Burlina, “Evaluating trade-offs in computer vision between attribute privacy, fairness and utility,” arXiv:2302.07917, 2023.
  82. L. McInnes, J. Healy, N. Saul, and L. Großberger, “UMAP: Uniform manifold approximation and projection,” Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018.
  83. G. Mavrotas, “Effective implementation of the ϵitalic-ϵ\epsilonitalic_ϵ-constraint method in multi-objective mathematical programming problems,” Applied Mathematics and Computation, vol. 213, no. 2, pp. 455–465, 2009.
  84. X. Wang, A. Al-Bashabsheh, C. Zhao, and C. Chan, “Adaptive label smoothing for classifier-based mutual information neural estimation,” in 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 1035–1040.
  85. I. Belghazi, S. Rajeswar, A. Baratin, R. D. Hjelm, and A. C. Courville, “MINE: mutual information neural estimation,” arXiv:1801.04062, 2018.
  86. M. D. Donsker and S. S. Varadhan, “Asymptotic evaluation of certain markov process expectations for large time. IV,” Communications on Pure and Applied Mathematics, vol. 36, no. 2, pp. 183–212, 1983.
  87. J. Birrell, P. Dupuis, M. A. Katsoulakis, L. Rey-Bellet, and J. Wang, “Variational representations and neural network estimation of Rényi divergences,” SIAM Journal on Mathematics of Data Science, vol. 3, no. 4, pp. 1093–1116, 2021.
  88. L. Sankar, S. R. Rajagopalan, and H. V. Poor, “Utility-privacy tradeoffs in databases: An information-theoretic approach,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 6, pp. 838–852, 2013.
  89. S. Asoodeh, M. Diaz, F. Alajaji, and T. Linder, “Information extraction under privacy constraints,” Information, vol. 7, no. 1, p. 15, 2016.
  90. ——, “Estimation efficiency under privacy constraints,” IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1512–1534, 2018.
  91. B. Rassouli and D. Gündüz, “Optimal utility-privacy trade-off with total variation distance as a privacy measure,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 594–603, 2019.
  92. I. Issa, A. B. Wagner, and S. Kamath, “An operational approach to information leakage,” IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1625–1657, 2019.
  93. M. Bloch, O. Günlü, A. Yener, F. Oggier, H. V. Poor, L. Sankar, and R. F. Schaefer, “An overview of information-theoretic security and privacy: Metrics, limits and applications,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 1, pp. 5–22, 2021.
  94. A. Zamani, T. J. Oechtering, and M. Skoglund, “Data disclosure with non-zero leakage and non-invertible leakage matrix,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 165–179, 2021.
  95. S. Saeidian, G. Cervia, T. J. Oechtering, and M. Skoglund, “Quantifying membership privacy via information leakage,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3096–3108, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Adam Gronowski (2 papers)
  2. William Paul (27 papers)
  3. Fady Alajaji (52 papers)
  4. Bahman Gharesifard (46 papers)
  5. Philippe Burlina (17 papers)
Citations (3)