Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions (2405.08027v4)

Published 12 May 2024 in cs.LG and cs.AI

Abstract: As ML models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find that agents are increasingly likely to receive positive decisions as the model gets retrained, whereas the proportion of agents with positive labels may decrease over time. We thus propose a refined retraining process to stabilize the dynamics. Last, we examine how algorithmic fairness can be affected by these retraining processes and find that enforcing common fairness constraints at every round may not benefit the disadvantaged group in the long run. Experiments on (semi-)synthetic and real data validate the theoretical findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Strategic classification. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, page 111–122, 2016a.
  2. Generalized strategic classification and the case of aligned incentives. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, pages 12593–12618, 2022.
  3. Data feedback loops: Model-driven amplification of dataset biases. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 33883–33920, 2023.
  4. Error amplification when updating deployed machine learning models. In Proceedings of the Machine Learning for Healthcare Conference, Durham, NC, USA, pages 5–6, 2022.
  5. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, page 55–70, 2018.
  6. The strategic perceptron. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 6–25, 2021.
  7. Learning strategy-aware linear classifiers. Advances in Neural Information Processing Systems, 33:15265–15276, 2020a.
  8. Equal improvability: A new fairness notion considering the long-term impact. In The Eleventh International Conference on Learning Representations, 2022.
  9. Who leads and who follows in strategic classification? Advances in Neural Information Processing Systems, pages 15257–15269, 2021.
  10. How do classifiers induce agents to invest effort strategically? page 1–23, 2020.
  11. Fairness interventions as (Dis)Incentives for strategic manipulation. In Proceedings of the 39th International Conference on Machine Learning, pages 26239–26264, 2022.
  12. Unintended selection: Persistent qualification rate disparities and interventions. Advances in Neural Information Processing Systems, pages 26053–26065, 2021.
  13. How do fair decisions fare in long-term qualification? In Advances in Neural Information Processing Systems, pages 18457–18469, 2020.
  14. The disparate equilibria of algorithmic decision making when individuals invest rationally. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 381–391, 2020.
  15. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268, 2015.
  16. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, 2016b.
  17. Alternative microfoundations for strategic classification. In Proceedings of the 38th International Conference on Machine Learning, pages 4687–4697, 2021.
  18. Hans Hofmann. Statlog (German Credit Data). UCI Machine Learning Repository, 1994. DOI: https://doi.org/10.24432/C5NC77.
  19. Quinlan Quinlan. Credit Approval. UCI Machine Learning Repository, 2017. DOI: https://doi.org/10.24432/C5FS30.
  20. Implicit racial bias in medical school admissions. Academic Medicine, 92(3):365–369, 2017.
  21. Ai and holistic review: informing human reading in college admissions. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 200–206, 2020.
  22. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics, pages 100–108, 2012.
  23. Discriminatory lending: Evidence from bankers in the lab. American Economic Journal: Applied Economics, 15(2):31–68, 2023.
  24. Best response regression. In Advances in Neural Information Processing Systems, 2017.
  25. The role of randomness and noise in strategic classification. CoRR, abs/2005.08377, 2020.
  26. How to learn when data reacts to your model: Performative gradient descent. In Proceedings of the 38th International Conference on Machine Learning, pages 4641–4650, 2021.
  27. Linear models are robust optimal under strategic behavior. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 2584–2592, 13–15 Apr 2021.
  28. Strategic classification with graph neural networks, 2022.
  29. Strategic ranking. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, pages 2489–2518, 2022.
  30. Learning losses for strategic classification. arXiv preprint arXiv:2203.13421, 2022.
  31. Causal strategic classification: A tale of two shifts, 2023.
  32. Stateful strategic regression. Advances in Neural Information Processing Systems, pages 28728–28741, 2021.
  33. Information discrepancy in strategic learning. In International Conference on Machine Learning, pages 1691–1715, 2022.
  34. Incentive mechanisms for strategic classification and regression problems. In Proceedings of the 23rd ACM Conference on Economics and Computation, page 760–790, 2022.
  35. Strategic recourse in linear classification. CoRR, abs/2011.00355, 2020b.
  36. Maximizing welfare with incentive-aware evaluation mechanisms. arXiv preprint arXiv:2011.01956, 2020.
  37. Multiagent evaluation mechanisms. Proceedings of the AAAI Conference on Artificial Intelligence, 34:1774–1781, 2020.
  38. Gaming helps! learning from strategic interactions in natural dynamics. In International Conference on Artificial Intelligence and Statistics, pages 1234–1242, 2021.
  39. Strategic classification is causal modeling in disguise. In Proceedings of the 37th International Conference on Machine Learning, 2020.
  40. Causal strategic linear regression. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020.
  41. Strategic instrumental variable regression: Recovering causal relationships from strategic responses. In International Conference on Machine Learning, pages 8502–8522, 2022.
  42. Discovering optimal scoring mechanisms in causal strategic prediction, 2023.
  43. Performative prediction. In Proceedings of the 37th International Conference on Machine Learning, pages 7599–7609, 2020.
  44. Performative power. In Advances in Neural Information Processing Systems, 2022.
  45. From predictions to decisions: Using lookahead regularization. In Advances in Neural Information Processing Systems, pages 4115–4126, 2020.
  46. A systematic study of bias amplification. arXiv preprint arXiv:2201.11706, 2022.
  47. Quantifying and mitigating the impact of label errors on model disparity metrics. In The Eleventh International Conference on Learning Representations, 2022.
  48. A theory of dynamic benchmarks. In The Eleventh International Conference on Learning Representations, 2022.
  49. Feature-wise bias amplification. arXiv preprint arXiv:1812.08999, 2018.
  50. Queens are powerful too: Mitigating gender bias in dialogue generation. arXiv preprint arXiv:1911.03842, 2019.
  51. Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. pages 5310–5319, 2019.
  52. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457, 2017.
  53. Hidden technical debt in machine learning systems. Advances in neural information processing systems, 28, 2015.
  54. Runaway feedback loops in predictive policing. In Conference on fairness, accountability and transparency, pages 160–171. PMLR, 2018.
  55. Feedback loop and bias amplification in recommender systems. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 2145–2148, 2020.
  56. Hidden risks of machine learning applied to healthcare: Unintended feedback loops between models and future data causing model degradation. In Proceedings of the 5th Machine Learning for Healthcare Conference, volume 126 of Proceedings of Machine Learning Research, pages 710–731. PMLR, 2020.
  57. Human interaction with recommendation systems. In International Conference on Artificial Intelligence and Statistics, pages 862–870, 2018.
  58. Deconvolving feedback loops in recommender systems. Advances in neural information processing systems, 29, 2016.
  59. Degenerate feedback loops in recommender systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, page 383–390, 2019.
  60. Equalizing recourse across groups, 2019.
  61. Delayed impact of fair machine learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 6196–6200, 2019.
  62. UCI machine learning repository, 2017. URL https://archive.ics.uci.edu/ml/datasets/credit+approval.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com