Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fair Active Learning: Solving the Labeling Problem in Insurance (2112.09466v4)

Published 17 Dec 2021 in stat.ML and cs.LG

Abstract: This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learning sampling methodologies and evaluates their impact on both synthetic and real insurance datasets. This analysis highlights the difficulty of achieving fair model inferences, as machine learning models may replicate biases and discrimination found in the underlying data. To tackle these interconnected challenges, the paper introduces an innovative fair active learning method. The proposed approach samples informative and fair instances, achieving a good balance between model predictive performance and fairness, as confirmed by numerical experiments on insurance datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (103)
  1. Fair active learning. Expert Systems with Applications, 199:116981, 2022.
  2. A Reductions Approach to Fair Classification. In Proceedings of the 35th International Conference on Machine Learning, 2018.
  3. N. Abe. Query Learning Strategies using Boosting and Bagging. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), pages 1–9, 1998.
  4. K. S Abraham. Efficiency and fairness in insurance risk classification. Virginia Law Review, pages 403–451, 1985.
  5. F. Anowar and S. Sadaoui. Multi-class ensemble learning of imbalanced bidding fraud data. In Advances in Artificial Intelligence: 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada, May 28–31, 2019, Proceedings 32, pages 352–358. Springer, 2019.
  6. Agnostic active learning. Journal of Computer and System Sciences, 75(1):78–89, 2009.
  7. L. Barry and A. Charpentier. Personalization as a promise: Can big data change the practice of insurance? Big Data & Society, 7(1):2053951720935143, 2020.
  8. Can active learning preemptively mitigate fairness issues? arXiv preprint arXiv:2104.06879, 2021.
  9. The Power of Ensembles for Active Learning in Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9368–9377, 2018.
  10. Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023.
  11. J. Baumann and M. Loi. Fairness and risk: an ethical argument for a group fairness definition insurers can use. Philosophy & Technology, 36(3):45, 2023.
  12. Learning algorithms for active learning. In International Conference on Machine Learning, pages 301–310. PMLR, 2017.
  13. Rethinking representations in P&C actuarial science with deep neural networks. arXiv preprint arXiv:2102.05784, 2021.
  14. Machine learning in P&C insurance: A review for pricing and reserving. Risks, 9(1):4, 2020.
  15. T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA, 2016. Association for Computing Machinery.
  16. A. Charpentier. Insurance, Biases, Discrimination and Fairness. Springer, 2023.
  17. Mitigating discrimination in insurance with Wasserstein barycenters. arXiv preprint arXiv:2306.12912, 2023.
  18. Building classifiers with independency constraints. In IEEE International Conference on Data Mining, 2009.
  19. A. Chen and E. Vigna. A unisex stochastic mortality model to comply with EU gender directive. Insurance: Mathematics and Economics, 73:124–136, 2017.
  20. Maximizing expected model change for active learning in regression. In 2013 IEEE 13th International Conference on Data Mining, pages 51–60. IEEE, 2013.
  21. C. Dutang and A. Charpentier. Package ‘casdatasets’. url: https://www. openml. org/search, 2020.
  22. I. Dagan and S.P. Engelson. Committee-based sampling for training probabilistic classifiers. In Proceedings of the Twelfth International Conference on Machine Learning, pages 150–157. Morgan Kaufmann, 1995.
  23. Fairness guarantees in multi-class classification with demographic parity. Journal of Machine Learning Research, 25(130):1–46, 2024.
  24. Effective Statistical Learning Methods for Actuaries I: GLMs and Extensions. Springer, 01 2019.
  25. T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1, pages 1–15. Springer, 2000.
  26. Empirical risk minimization under fairness constraints. In Neural Information Processing Systems, 2018.
  27. Learning on the border: active learning in imbalanced data classification. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 127–136, 2007.
  28. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, page 127–136, New York, NY, USA, 2007. Association for Computing Machinery.
  29. S. Frezal and L. Barry. Fairness in uncertainty: Some limits and misinterpretations of actuarial fairness. Journal of Business Ethics, 167:127–136, 2020.
  30. The discriminating (pricing) actuary. North American Actuarial Journal, 27(1):2–24, 2023.
  31. Selecting influential examples: Active learning with expected model output changes. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 562–577. Springer, 2014.
  32. Y. Freund and R.E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  33. A survey on instance selection for active learning. Knowledge and Information Systems, 35(2):249–283, 2013.
  34. Y. Gal and Z. Ghahramani. Dropout as a Bayesian Approximation: Insights and Applications. In Deep Learning Workshop, ICML, volume 1, page 2, 2015.
  35. Y. Gal and Z. Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 1050–1059. JMLR.org, 2016.
  36. M. Guillén. Sexless and beautiful data: from quantity to quality. Annals of Actuarial Science, 6(2):231–234, 2012.
  37. G. Gao and M.V. Wüthrich. Convolutional neural network classification of telematics car driving data. Risks, 7(1):6, 2019.
  38. S. Hanneke et al. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
  39. Discrimination prevention in data mining for intrusion and crime detection. In 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pages 47–54, 2011.
  40. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(10):1936–1949, 2014.
  41. S.E. Harrington and G. Niehaus. Race, redlining, and automobile insurance prices. The Journal of Business, 71(3):439–469, 1998.
  42. Equality of opportunity in supervised learning. In Neural Information Processing Systems, 2016.
  43. Parametric fairness with statistical guarantees. arXiv preprint arXiv:2310.20508, 2023.
  44. A sequentially fair mechanism for multiple sensitive attributes. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12502–12510, Mar. 2024.
  45. Big data and actuarial science. Big Data and Cognitive Computing, 4(4):40, 2020.
  46. E. Hüllermeier and W. Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110:457–506, 2021.
  47. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–16, 2019.
  48. J. Ito. Supposedly ‘fair’algorithms can perpetuate discrimination. Wired, Conde Nast, www. wired. com/story/ideas-joi-ito-insurance-algorithms, 2019.
  49. Cold-start active learning for image classification. Information Sciences, 616:16–36, 2022.
  50. Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Systems with Applications, 62:32–43, 2016.
  51. Data mining to predict and prevent errors in health insurance claims processing. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 65–74, 2010.
  52. K. Kita and Ł. Kidziński. Google street view image of a house predicts car accident risk of its resident. arXiv preprint arXiv:1904.05270, 2019.
  53. M. Kearns and A. Roth. The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford University Press, 2019.
  54. Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl. Inf. Syst., 35(3):613–644, 2013.
  55. X. Landes. How fair is actuarial fairness? Journal of Business Ethics, 128:519–533, 2015.
  56. Learning how to actively learn: A deep imitation learning approach. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1874–1883, 2018.
  57. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
  58. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 1188–1196, Bejing, China, 22–24 Jun 2014. PMLR.
  59. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6405–6416, Red Hook, NY, USA, 2017. Curran Associates Inc.
  60. What is fair? proxy discrimination vs. demographic disparities in insurance pricing. Proxy Discrimination vs. Demographic Disparities in Insurance Pricing (February 1, 2024), 2024.
  61. Using multi-class AdaBoost tree for prediction frequency of auto insurance. Journal of Applied Finance and Banking, 4(5):45, 2014.
  62. A. Maillart. Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data. European Actuarial Journal, pages 1–39, 2021.
  63. Interpretable Machine Learning–A Brief History, State-of-the-Art and Challenges. In ECML PKDD 2020 Workshops: Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): Ghent, Belgium, September 14–18, 2020, Proceedings, pages 417–431. Springer, 2021.
  64. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
  65. A.K. McCallumzy and K. Nigamy. Employing EM and pool-based active learning for text classification. In Proc. International Conference on Machine Learning (ICML), pages 359–367, 1998.
  66. Active learning with expected error reduction. arXiv preprint arXiv:2211.09283, 2022.
  67. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, page 3111–3119, Red Hook, NY, USA, 2013. Curran Associates Inc.
  68. G. Meyers and I. Van Hoyweghen. Enacting actuarial fairness in insurance: From fair discrimination to behaviour-based fairness. Science as Culture, 27(4):413–438, 2018.
  69. Epistemic uncertainty sampling. In Discovery Science: 22nd International Conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings 22, pages 72–86. Springer, 2019.
  70. Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2). In Doklady Akademii Nauk SSSR, 1983.
  71. Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
  72. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019.
  73. Meta-learning transferable active learning policies by deep reinforcement learning. arXiv preprint arXiv:1806.04798, 2018.
  74. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 560–568, 2008.
  75. R. Richman. AI in actuarial science – a review of recent advances – part 1. Annals of Actuarial Science, 15(2):207–229, 2021.
  76. N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In ICML, 2001.
  77. T. Reitmaier and B. Sick. Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences, 230:106–131, 05 2013.
  78. A survey of deep active learning. arXiv preprint arXiv:2009.00236, 2020.
  79. A survival certification model based on active learning over medical insurance data. In J. Shao, Man L. Yiu, M. Toyoda, D. Zhang, W. Wang, and B. Cui, editors, Web and Big Data, pages 156–170, Cham, 2019. Springer International Publishing.
  80. Reinforcement learning: An introduction. MIT press, 2018.
  81. Cost-sensitive Multi-Class AdaBoost for Understanding Driving Behavior based on Telematics. ASTIN Bulletin: The Journal of the IAA, 51(3):719–751, 2021.
  82. B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, page 1070–1079, USA, 2008. Association for Computational Linguistics.
  83. Promoting fairness in learned models by learning to active learn under parity constraints. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2149–2156, 2022.
  84. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5972–5981, 2019.
  85. B Settles. Active learning literature survey. 2009.
  86. C.E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948.
  87. L. S Shapley. A value for n-person games. 1953.
  88. M. Sundararajan and A. Najmi. The many Shapley values for model explanation. In International Conference on Machine Learning, pages 9269–9278. PMLR, 2020.
  89. Less is more: Sampling chemical space with active learning. The Journal of Chemical Physics, 148(24):241733, 2018.
  90. Query by committee. In Proceedings of the fifth Annual Workshop on Computational Learning Theory, pages 287–294, 1992.
  91. S. Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011.
  92. O. Sener and S. Savarese. Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  93. S. Sivaraman and M. Trivedi. Active learning for on-road vehicle detection: A comparative study. Machine Vision and Applications, 25:599–611, 2011.
  94. Y. Thiery and C. Van S. Fairness and equality in insurance classification. The Geneva Papers on Risk and Insurance-Issues and Practice, 31(2):190–211, 2006.
  95. Innovative insurance schemes: Pay as/how you drive. Transportation Research Procedia, 14:362–371, 2016. Transport Research Arena TRA2016.
  96. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 chi conference on human factors in computing systems, pages 1–14, 2018.
  97. M. A. Wiering and M. Van Otterlo. Reinforcement learning. Adaptation, Learning, and Optimization, 12(3):729, 2012.
  98. Bounding uncertainty for active batch selection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5240–5247, 2019.
  99. X. Xin and F. Huang. Antidiscrimination insurance pricing: Regulations, fairness criteria, and models. North American Actuarial Journal, pages 1–35, 2023.
  100. D. Yoo and I. So Kweon. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 93–102, 2019.
  101. Cold-start active learning through self-supervised language modeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7935–7948, Online, November 2020. Association for Computational Linguistics.
  102. J. Zhu and E. Hovy. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 783–790, 2007.
  103. Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 20(75):1–42, 2019.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com