Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 41 tok/s
GPT-5 High 42 tok/s Pro
GPT-4o 109 tok/s
GPT OSS 120B 477 tok/s Pro
Kimi K2 222 tok/s Pro
2000 character limit reached

How to Evaluate Behavioral Models (2306.04778v2)

Published 7 Jun 2023 in cs.LG and cs.GT

Abstract: Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub "diagonal bounded Bregman divergences", that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. A Characterization of Scoring Rules for Linear Properties. Conference on Learning Theory.
  2. Scaling up psychology via Scientific Regret Minimization. Proceedings of the National Academy of Sciences, 117(16): 8825–8835.
  3. The likelihood principle. Institute of Mathematical Statistics.
  4. Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems. In IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), 901–908.
  5. Brier, G. W. 1950. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1): 1–3.
  6. Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychological review, 100(3): 432.
  7. A cognitive hierarchy theory of one-shot games: Some preliminary results. Levine’s bibliography, UCLA Department of Economics.
  8. A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3): 861–898.
  9. On the Utility of Learning about Humans for Human-AI Coordination. Advances in neural information processing systems.
  10. Easton, M. L. 1989. Finite de Finetti style theorems. In Group invariance in applications in statistics, volume 1, 108–121. Institute of Mathematical Statistics.
  11. Order-sensitivity and equivariance of scoring functions. Electronic Journal of Statistics, 13(1): 1166 – 1211.
  12. Friedman, D. 1983. Effective Scoring Rules for Probabilistic Forecasts. Management Science, 29(4): 447–454.
  13. Measuring the completeness of economic models. Journal of Political Economy, 130(4): 956–990.
  14. Predicting and understanding initial play. American Economic Review, 109(12): 4112–41.
  15. Non-equilibrium play in centipede games. Games and Economic Behavior, 120: 391–433.
  16. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477): 359–378.
  17. Ten Little Treasures of Game Theory and Ten Intuitive Contradictions. American Economic Review, 91(5): 1402–1422.
  18. The dual accumulator model of strategic deliberation and decision making. Psychological review.
  19. Toward a Characterization of Loss Functions for Distribution Learning. In Advances in Neural Information Processing Systems, 7237–7246.
  20. A comparison of the predictive potential of artificial neural networks and nested logit models for commuter mode choice. Transportation Research Part E: Logistics and Transportation Review, 36(3): 155–172.
  21. “Other-Play”’ for Zero-Shot Coordination. In Proceedings of the 37th International Conference on Machine Learning.
  22. Jose, V. R. 2009. A characterization for the spherical scoring rule. Theory and Decision, 66(3): 263–281.
  23. Kneeland, T. 2015. Identifying higher-order rationality. Econometrica, 83(5): 2065–2079.
  24. Neural networks for predicting human interactions in repeated games. arXiv preprint arXiv:1911.03233.
  25. Eliciting properties of probability distributions. Proceedings of the ACM Conference on Electronic Commerce, 129–138.
  26. Bridging Level-K to Nash Equilibrium. Review of Economics and Statistics, 104: 1329–1340.
  27. Optimization of Scoring Rules. In Proceedings of the 23rd ACM Conference on Economics and Computation, 988–989.
  28. McCarthy, J. 1956. Measures of the value of information. Proceedings of the National Academy of Sciences of the United States of America, 42(9): 654.
  29. An experimental study of the centipede game. Econometrica, 60: 803–836.
  30. Nau, R. F. 1985. Should Scoring Rules be “Effective”? Management Science, 31(5): 527–535.
  31. Predicting human decisions with behavioral theories and machine learning. CoRR, abs/1904.06866.
  32. Savage, L. J. 1971. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66(336): 783–801.
  33. Selten, R. 1998. Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1(1): 43–61.
  34. Stationary Concepts for Experimental 2×2-Games. The American Economic Review, 98(3): 938–966.
  35. Experimental evidence on players’ models of other players. Journal of economic behavior & organization, 25(3): 309–327.
  36. On Players’ Models of Other Players: Theory and Experimental Evidence. Games and Economic Behavior, 10: 218–254.
  37. Predicting human behavior in unrepeated, simultaneous-move games. Games and Economic Behavior, 106: 16–37.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.