Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Risk Preferences of Learning Algorithms (2205.04619v3)

Published 10 May 2022 in cs.LG, cs.AI, and econ.TH

Abstract: Agents' learning from feedback shapes economic outcomes, and many economic decision-makers today employ learning algorithms to make consequential choices. This note shows that a widely used learning algorithm, $\varepsilon$-Greedy, exhibits emergent risk aversion: it prefers actions with lower variance. When presented with actions of the same expectation, under a wide range of conditions, $\varepsilon$-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, ranging from concerns about fairness to homogenization, and holds transiently even when the riskier action has a strictly higher expected payoff. We discuss two methods to correct this bias. The first method requires the algorithm to reweight data as a function of how likely the actions were to be chosen. The second requires the algorithm to have optimistic estimates of actions for which it has not collected much data. We show that risk-neutrality is restored with these corrections.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Joseph G Altonji and Charles R Pierret. 2001. Employer learning and statistical discrimination. The Quarterly Journal of Economics 116, 1 (2001), 313–350.
  2. Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.
  4. Mehrsa Baradaran. 2018. Jim Crow Credit. UC Irvine L. Rev. 9 (2018), 887.
  5. Early-Career Discrimination: Spiraling or Self-Correcting? (2020).
  6. Dirk Bergemann and Juuso Välimäki. 2017. Bandit Problems. Palgrave Macmillan UK, London, 1–7. https://doi.org/10.1057/978-1-349-95121-5_2386-1
  7. Patrick Bolton and Christopher Harris. 1999. Strategic experimentation. Econometrica 67, 2 (1999), 349–374.
  8. Zach Y Brown and Alexander MacKay. 2023. Competition in pricing algorithms. American Economic Journal: Microeconomics 15, 2 (2023), 109–156.
  9. Artificial intelligence, algorithmic pricing, and collusion. American Economic Review 110, 10 (2020), 3267–3297.
  10. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 224–232. https://doi.org/10.1145/3240323.3240370
  11. Gregory S Crawford and Matthew Shum. 2005. Uncertainty and learning in pharmaceutical demand. Econometrica 73, 4 (2005), 1137–1173.
  12. Lin Fan and Peter W. Glynn. 2021. Diffusion Approximations for Thompson Sampling. arXiv:2105.09232 [cs.LG]
  13. Henry S Farber and Robert Gibbons. 1996. Learning and wage dynamics. The Quarterly Journal of Economics 111, 4 (1996), 1007–1047.
  14. Uriel Frisch and Hélene Frisch. 1995. Universality of escape from a half-space for symmetrical random walks. In Lévy Flights and Related Topics in Physics: Proceedings of the International Workshop Held at Nice, France, 27–30 June 1994. Springer, Berlin, 262–268.
  15. P. Hall and C.C. Heyde. 1980. 3 - The Central Limit Theorem. In Martingale Limit Theory and its Application, P. Hall and C.C. Heyde (Eds.). Academic Press, 51–96. https://doi.org/10.1016/B978-0-12-319350-6.50009-8
  16. Fairness in learning: Classic and contextual bandits. Advances in Neural Information Processing Systems 29 (2016).
  17. Anand Kalvit and Assaf Zeevi. 2021a. A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms. arXiv:2106.02126 [cs.LG]
  18. Anand Kalvit and Assaf Zeevi. 2021b. A closer look at the worst-case behavior of multi-armed bandit algorithms. Advances in Neural Information Processing Systems 34 (2021), 8807–8819.
  19. Strategic experimentation with exponential bandits. Econometrica 73, 1 (2005), 39–68.
  20. Nicolas Klein and Sven Rady. 2011. Negatively correlated bandits. The Review of Economic Studies 78, 2 (2011), 693–732.
  21. Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms. Cambridge University Press. https://doi.org/10.1017/9781108571401
  22. Calibrated fairness in bandits. arXiv preprint arXiv:1707.01875 (2017).
  23. Benjamin M. Marlin and Richard S. Zemel. 2009. Collaborative Prediction and Ranking with Non-Random Missing Data. In Proceedings of the Third ACM Conference on Recommender Systems (New York, New York, USA) (RecSys ’09). Association for Computing Machinery, New York, NY, USA, 5–12. https://doi.org/10.1145/1639714.1639717
  24. Leon Musolff. 2022. Algorithmic pricing facilitates tacit collusion: Evidence from e-commerce. In Proceedings of the 23rd ACM Conference on Economics and Computation. 32–33.
  25. Achieving fairness in the stochastic multi-armed bandit problem. The Journal of Machine Learning Research 22, 1 (2021), 7885–7915.
  26. Aleksandrs Slivkins et al. 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning 12, 1-2 (2019), 1–286.

Summary

We haven't generated a summary for this paper yet.