A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence (2207.01028v3)

Published 3 Jul 2022 in math.OC

Abstract: How do people navigate the exploration-exploitation (EE) trade-off when making repeated choices with unknown rewards? We study this question through the lens of multi-armed bandit problems and introduce a novel behavioral model, Quantal Choice with Adaptive Reduction of Exploration (QCARE). It generalizes Thompson Sampling, allowing for a principled way to quantify the EE trade-off and reflect human decision-making patterns. The model adaptively reduces exploration as information accumulates, with the reduction rate serving as a parameter to quantify the EE trade-off dynamics. We theoretically analyze how varying reduction rates influence decision quality, shedding light on the effects of over-exploration'' andunder-exploration.'' Empirically, we validate QCARE through experiments collecting behavioral data from human participants. QCARE not only captures critical behavioral patterns in the EE trade-off but also outperforms alternative models in predictive power. Our analysis reveals a behavioral tendency toward over-exploration.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence (2207.01028v3)

Summary

Related Papers