Falcon: Fair Active Learning using Multi-armed Bandits (2401.12722v2)
Abstract: Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8-4.5x higher than the second-best results.
- Naoki Abe and Hiroshi Mamitsuka. 1998. Query Learning Strategies Using Boosting and Bagging. In ICML. San Francisco, CA, USA, 1–9.
- Active Sampling for Min-Max Fairness. In ICML.
- Active Sampling for Min-Max Fairness. In ICML. 53–65.
- Active Learning for Imbalanced Datasets. In WACV. 1417–1426.
- Minority Class Oriented Active Learning for Imbalanced Datasets. In ICPR. 9920–9927.
- Fair active learning. Expert Systems with Applications 199 (2022), 116981.
- Machine bias: There’s software used across the country to predict future criminals. And its biased against blacks.
- Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In ICLR.
- Finite-Time Analysis of the Multiarmed Bandit Problem. 47, 2–3 (2002), 235–256.
- The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput. 32, 1 (2002), 48–77.
- Seiden: Revisiting Query Processing in Video Database Systems. Proc. VLDB Endow. 16, 9 (may 2023), 2289–2301.
- Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.
- AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.
- Contextual Bandit Algorithms with Supervised Learning Guarantees. In AISTATS (JMLR Proceedings), Vol. 15. 19–26.
- Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32. Microsoft.
- Effective Entity Augmentation by Querying External Data Sources. Proc. VLDB Endow. 16, 11 (jul 2023), 3404–3417.
- Yiting Cao and Chao Lan. 2022. Fairness-Aware Active Learning for Decoupled Model. In IJCNN. 1–9.
- Selective Data Acquisition in the Wild for Model Charging. Proc. VLDB Endow. 15, 7 (2022), 1466–1478.
- Why Is My Classifier Discriminatory?. In NeurIPS. 3543–3554.
- Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 2 (2017), 153–163.
- Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018).
- Retiring Adult: New Datasets for Fair Machine Learning. In NeurIPS. 6478–6490.
- Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing. In ICML, Vol. 119. 2803–2813.
- A Convex Optimization Framework for Active Learning. In ICCV. 209–216.
- Regret Bounds for Batched Bandits. 35, 8 (May 2021), 7340–7348.
- Certifying and Removing Disparate Impact. In KDD. 259–268.
- Mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and Over? Proc. VLDB Endow. 16, 12 (aug 2023), 4002–4005.
- Equality of Opportunity in Supervised Learning. In NeurIPS. 3315–3323.
- Wei-Ning Hsu and Hsuan-Tien Lin. 2015. Active Learning by Learning. In AAAI. 2659–2665.
- Ihab F. Ilyas and Theodoros Rekatsinas. 2022. Machine Learning and Data Cleaning: Which Serves the Other? ACM J. Data Inf. Qual. 14, 3 (2022), 13:1–13:11.
- Vasileios Iosifidis and Eirini Ntoutsi. 2019. AdaFair: Cumulative Fairness Adaptive Boosting. In CIKM. ACM, 781–790.
- Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1–33.
- Rotting Infinitely Many-Armed Bandits. In ICML (Proceedings of Machine Learning Research), Vol. 162. PMLR, 11229–11254.
- Boostclean: Automated error detection and repair for machine learning. arXiv preprint arXiv:1711.01299 (2017).
- Operationalizing Individual Fairness with Pairwise Fair Representations. Proc. VLDB Endow. 13, 4 (Dec. 2019), 506–518.
- Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms. Cambridge University Press.
- Dong-Hyun Lee. 2013. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.
- Rotting Bandits. In NeurIPS. 3074–3083.
- David D. Lewis and William A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In SIGIR. 3–12.
- Picket: guarding against corrupted data in tabular data during learning and inference. VLDB J. 31, 5 (2022), 927–955.
- Aditya Krishna Menon and Robert C. Williamson. 2018. The cost of fairness in binary classification. In FAT, Vol. 81. 107–118.
- Hieu Tat Nguyen and Arnold W. M. Smeulders. 2004. Active learning using pre-clustering. In ICML.
- Scikit-learn: Machine Learning in Python. JMLR 12 (2011), 2825–2830.
- Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endow. 11, 3, 269.
- Sample Selection for Fair and Robust Training. In NeurIPS. 815–827.
- FairBatch: Batch Selection for Model Fairness. In ICLR.
- Dan Roth and Kevin Small. 2006. Margin-Based Active Learning for Structured Output Spaces. In ECML, Vol. 4212. Springer, 413–424.
- Nicholas Roy and Andrew McCallum. 2001. Toward Optimal Active Learning Through Sampling Estimation of Error Reduction. In ICML. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 441–448.
- Interventional fairness: Causal database repair for algorithmic fairness. In SIGMOD. 793–810.
- Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In ICLR.
- Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers.
- Burr Settles and Mark Craven. 2008. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In EMNLP. 1070–1079.
- Multiple-instance Active Learning. In NIPS (Vancouver, British Columbia, Canada). Curran Associates Inc., USA, 1289–1296.
- Query by Committee. In COLT (Pittsburgh, Pennsylvania, USA). ACM, New York, NY, USA, 287–294.
- C. E. Shannon. 2001. A Mathematical Theory of Communication. SIGMOBILE Mob. Comput. Commun. Rev. 5, 1 (jan 2001), 3–55.
- Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints. In FAccT. ACM, 2149–2156.
- Adaptive Sampling for Minimax Fair Classification. In NeurIPS. 24535–24544.
- Practical Bayesian Optimization of Machine Learning Algorithms. In NeurIPS (NIPS’12). 2951–2959.
- Ki Hyun Tae and Steven Euijong Whang. 2021. Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models. In SIGMOD. ACM, 1771–1783.
- Falcon: Fair Active Learning using MABs. https://github.com/khtae8250/Falcon/blob/main/techreport.pdf.
- Seedb: Efficient data-driven visualization recommendations to support visual analytics. Proc. VLDB Endow. 8, 13, 2182.
- Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In FairWare@ICSE. 1–7.
- Joannès Vermorel and Mehryar Mohri. 2005. Multi-armed Bandit Algorithms and Empirical Evaluation. In ECML. Springer Berlin Heidelberg, 437–448.
- Dan Wang and Yi Shang. 2014. A new active labeling method for deep learning. In IJCNN. 112–119.
- Algorithms for Infinitely Many-Armed Bandits. In NeurIPS, Vol. 21. Curran Associates, Inc.
- Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective. VLDB J. (2023).
- Fair Robust Active Learning by Joint Inconsistency. arXiv:2209.10729
- OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning. In SIGMOD. ACM, 2076–2088.
- iFlipper: Label Flipping for Individual Fairness. Proc. ACM Manag. Data 1, 1 (2023), 8:1–8:26.
- Han Zhao and Geoffrey J. Gordon. 2019. Inherent Tradeoffs in Learning Fair Representations. In NeurIPS. 15649–15659.
- Ki Hyun Tae (6 papers)
- Hantian Zhang (30 papers)
- Jaeyoung Park (11 papers)
- Kexin Rong (14 papers)
- Steven Euijong Whang (27 papers)