Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Disentangling Exploration from Exploitation (2404.19116v1)

Published 29 Apr 2024 in econ.TH and cs.GT

Abstract: Starting from Robbins (1952), the literature on experimentation via multi-armed bandits has wed exploration and exploitation. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined: a policymaker may assess new policies different than the status quo; an investor may evaluate projects outside her portfolio. We characterize the optimal experimentation policy when exploration and exploitation are disentangled in the case of Poisson bandits, allowing for general news structures. The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index a la Gittins. Disentanglement is particularly valuable for intermediate parameter values.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Best arm identification in multi-armed bandits. In COLT, pp.  41–53.
  2. Early-career discrimination: Spiraling or self-correcting? mimeo.
  3. Venture capital financing, moral hazard, and learning. Journal of Banking & Finance 22(6-8), 703–735.
  4. Bandit problems.
  5. Strategic experimentation. Econometrica 67(2), 349–374.
  6. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science 412(19), 1832–1852.
  7. Recommender systems as mechanisms for social learning. The Quarterly Journal of Economics 133(2), 871–925.
  8. Optimal dynamic allocation of attention. American Economic Review 109(8), 2993–3029.
  9. Uncertainty and learning in pharmaceutical demand. Econometrica 73(4), 1137–1173.
  10. Currie, J. M. and W. B. MacLeod (2020). Understanding doctor decision making: The case of depression treatment. Econometrica 88(3), 847–878.
  11. Learning while experimenting. The Economic Journal 130(625), 65–92.
  12. Dickstein, M. J. et al. (2021). Efficient provision of experience goods: Evidence from antidepressant choice.
  13. On optimal scheduling. American Economic Journal: Microeconomics.
  14. Multi-armed bandit allocation indices. John Wiley & Sons.
  15. Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society Series B: Statistical Methodology 41(2), 148–164.
  16. Gittins, J. C. and D. M. Jones (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3), 561–565.
  17. Guo, Y. (2016). Dynamic delegation of experimentation. American Economic Review 106(8), 1969–2008.
  18. Incentives for experimenting agents. The RAND Journal of Economics 44(4), 632–663.
  19. Jovanovic, B. (1979). Job matching and the theory of turnover. Journal of Political Economy 87(5, Part 1), 972–990.
  20. Strategic experimentation with poisson bandits. Theoretical Economics 5(2), 275–311.
  21. Strategic experimentation with exponential bandits. Econometrica 73(1), 39–68.
  22. Complementary information and learning traps. The Quarterly Journal of Economics 135(1), 389–448.
  23. Dynamically aggregating diverse information. Econometrica 90(1), 47–80.
  24. Rational inattention: A review. Journal of Economic Literature 61(1), 226–273.
  25. Miller, R. A. (1984). Job matching and occupational choice. Journal of Political Economy 92(6), 1086–1120.
  26. Robbins, H. (1952). Some aspects of the sequential design of experiments.
  27. Rothschild, M. (1974). A two-armed bandit theory of market pricing. Journal of Economic Theory 9(2), 185–202.
  28. Sims, C. A. (2003). Implications of rational inattention. Journal of monetary Economics 50(3), 665–690.
  29. Strulovici, B. (2010). Learning while voting: Determinants of collective experimentation. Econometrica 78(3), 933–971.
  30. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4), 285–294.
  31. Wald, A. (1947). Foundations of a general theory of sequential decision functions. Econometrica, 279–313.
  32. Zhuo, R. (2023). Exploit or explore? an empirical study of resource allocation in research labs. mimeo.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: