Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A State-Space Perspective on Modelling and Inference for Online Skill Rating (2308.02414v3)

Published 4 Aug 2023 in stat.AP and stat.ML

Abstract: We summarise popular methods used for skill rating in competitive sports, along with their inferential paradigms and introduce new approaches based on sequential Monte Carlo and discrete hidden Markov models. We advocate for a state-space model perspective, wherein players' skills are represented as time-varying, and match results serve as observed quantities. We explore the steps to construct the model and the three stages of inference: filtering, smoothing and parameter estimation. We examine the challenges of scaling up to numerous players and matches, highlighting the main approximations and reductions which facilitate statistical and computational efficiency. We additionally compare approaches in a realistic experimental pipeline that can be easily reproduced and extended with our open-source Python package, https://github.com/SamDuffield/abile.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 72(3), 269–342.
  2. Bradley, R. A. and M. E. Terry (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika 39(3/4), 324–345.
  3. Cappé, O. (2011). Online EM algorithm for hidden Markov models. Journal of Computational and Graphical Statistics 20(3), 728–749.
  4. Introduction to Sequential Monte Carlo. Springer International Publishing.
  5. TrueSkill Through Time: Revisiting the History of Chess. In J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.), Advances in Neural Information Processing Systems, Volume 20. Curran Associates, Inc.
  6. On backward smoothing algorithms. The Annals of Statistics 51(5), 2145 – 2169.
  7. Davidson, R. R. (1970). On Extending the Bradley-Terry model to Accommodate Ties in Paired Comparison Experiments. Journal of the American Statistical Association 65(329), 317–328.
  8. Forward smoothing using sequential Monte Carlo.
  9. Dixon, M. J. and S. G. Coles (1997). Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics) 46(2), 265–280.
  10. Comparison of resampling schemes for particle filtering. In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., pp.  64–69. IEEE.
  11. Sequential Monte Carlo smoothing for general state space hidden Markov models. The Annals of Applied Probability 21(6), 2109 – 2145.
  12. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and computing 10, 197–208.
  13. Duffield, S. (2024). ghq: Gauss-Hermite quadrature in JAX.
  14. Duffield, S. and S. S. Singh (2022). Online Particle Smoothing With Application to Map-Matching. IEEE Transactions on Signal Processing 70, 497–508.
  15. Elo, A. (1978). The Rating of Chessplayers, Past and Present. Ishi Press.
  16. Evensen, G. (2009). Data assimilation: the ensemble Kalman filter, Volume 2. Springer.
  17. FIDE (2023). International chess federation.
  18. Finke, A. and S. S. Singh (2017). Approximate smoothing and parameter estimation in high-dimensional state-space models. IEEE Transactions on Signal Processing 65(22), 5982–5994.
  19. Bayesian workflow.
  20. Factorial hidden Markov models. In D. Touretzky, M. Mozer, and M. Hasselmo (Eds.), Advances in Neural Information Processing Systems, Volume 8. MIT Press.
  21. Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. Journal of the Royal Statistical Society: Series C (Applied Statistics) 48(3), 377–394.
  22. Monte Carlo smoothing for nonlinear time series. Journal of the American statistical association 99(465), 156–168.
  23. The analysis and forecasting of tennis matches by using a high dimensional dynamic model. Journal of the Royal Statistical Society Series A: Statistics in Society 182(4), 1393–1409.
  24. Bayesian inference for plackett-luce ranking models. In proceedings of the 26th annual international conference on machine learning, pp.  377–384.
  25. TrueSkill™: A Bayesian Skill Rating System. In B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Advances in Neural Information Processing Systems, Volume 19. MIT Press.
  26. Using Elo ratings for match result prediction in association football. International Journal of Forecasting 26(3), 460–470. Sports Forecasting.
  27. Ingram, M. (2021). How to extend Elo: a Bayesian perspective. Journal of Quantitative Analysis in Sports 17(3), 203–219.
  28. Joshy, V. (2024). OpenSkill: A faster asymmetric multi-team, multiplayer rating system. Journal of Open Source Software 9(93), 5901.
  29. Julier, S. J. and J. K. Uhlmann (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE 92(3), 401–422.
  30. On Particle Methods for Parameter Estimation in State-Space Models. Statistical Science 30(3), 328 – 351.
  31. Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 381–393.
  32. Modelling Competitive Sports: Bradley-Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes.
  33. Kovalchik, S. A. (2016). Searching for the GOAT of tennis win prediction. Journal of Quantitative Analysis in Sports 12(3), 127–138.
  34. Assessing Approximate Inference for Binary Gaussian Process Classification. Journal of machine learning research 6(10).
  35. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis.
  36. Menke, J. E. and T. R. Martinez (2008). A Bradley–Terry artificial neural network model for individual ratings in group competitions. Neural computing and Applications 17, 175–186.
  37. Continuous-time state-space modelling of the hot hand in basketball. AStA Advances in Statistical Analysis 107(1-2), 313–326.
  38. TrueSkill 2: An improved Bayesian skill rating system. Technical Report MSR-TR-2018-8, Microsoft.
  39. Minka, T. P. (2001a). Expectation propagation for approximate Bayesian inference. In J. S. Breese and D. Koller (Eds.), UAI ’01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, University of Washington, Seattle, Washington, USA, August 2-5, 2001, pp.  362–369. Morgan Kaufmann.
  40. Minka, T. P. (2001b). A family of algorithms for approximate Bayesian inference. Ph. D. thesis, Massachusetts Institute of Technology.
  41. Neal, R. M. and G. E. Hinton (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pp.  355–368. Springer.
  42. Ollivier, Y. (2018). Online natural gradient as a kalman filter.
  43. The hot hand in professional darts. Journal of the Royal Statistical Society Series A: Statistics in Society 183(2), 565–580.
  44. Pelánek, R. (2016). Applications of the Elo rating system in adaptive educational systems. Computers and Education 98, 169–179.
  45. Plackett, R. L. (1975). The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24(2), 193–202.
  46. Rebeschini, P. and R. van Handel (2015). Can local particle filters beat the curse of dimensionality? The Annals of Applied Probability 25(5), 2809 – 2866.
  47. Exploiting locality in high-dimensional Factorial hidden Markov models. Journal of Machine Learning Research 23(4), 1–34.
  48. Bayesian filtering and smoothing, Volume 17. Cambridge university press.
  49. Stefani, R. (2011). The methodology of officially recognized international sports rating systems. Journal of Quantitative Analysis in Sports 7(4).
  50. Understanding draws in Elo rating algorithm. Journal of Quantitative Analysis in Sports 16(3), 211–220.
  51. Simplified Kalman filter for on-line rating: one-fits-all approach. Journal of Quantitative Analysis in Sports.
  52. An overview of composite likelihood methods. Statistica Sinica, 5–42.
  53. Pairwise likelihood inference for general state space models. Econometric Reviews 28(1-3), 170–185.
  54. Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of mathematical psychology 44(1), 92–107.
  55. A Bayesian test for the hot hand phenomenon. Journal of Mathematical Psychology 72, 200–209.
  56. Wheatcroft, E. (2021). Forecasting football matches by predicting match statistics. Journal of Sports Analytics 7(2), 77–97.
  57. Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. International Journal of Forecasting 28(2), 532–542.
Citations (2)

Summary

  • The paper proposes a novel state-space framework that treats player skills as latent variables, enabling dynamic and flexible online rating.
  • It utilizes advanced inference techniques, including Sequential Monte Carlo and factorial hidden Markov models, to enhance prediction accuracy.
  • The research demonstrates superior predictive performance and efficient parameter estimation using the EM algorithm across diverse competitive domains.

A State-Space Perspective on Modelling and Inference for Online Skill Rating

The paper presents a sophisticated approach to the evaluation of skill levels in competitive environments, utilizing a state-space model (SSM) framework to address the challenge of online skill rating. Traditional models such as the Elo and Glicko systems, while useful, are often limited in terms of flexibility and the ability to model skill dynamics over time comprehensively. This paper breaks new ground by proposing a robust methodological framework that incorporates state-space models to capture the temporal variability of skills and uses advanced inferential paradigms.

Contribution and Methodological Advances

The paper introduces a framework where player skills are postulated as latent variables within a dynamic system, allowing for the incorporation of temporal changes and uncertainty over time. The key methodological innovations include:

  1. State-Space Model Framework: By conceptualizing skill evolution as a state-space process, the framework offers a versatile structure that accommodates complex dynamics and non-Gaussian likelihoods. This framework extends beyond the static latent traits of earlier models.
  2. Inference Techniques:
    • Sequential Monte Carlo (SMC): The application of SMC methods allows for the approximate inference of full distributional properties of skill levels, effectively managing the non-linear, non-Gaussian nature of real-world scenarios.
    • Factorial Hidden Markov Models (fHMMs): Discrete skills are modeled within finite state-spaces, offering another dimension for handling categorical data scenarios and providing a computationally efficient parallel to the continuous case.
  3. Parameter Estimation via EM Algorithm: The research underscores the use of the expectation-maximization (EM) algorithm to maximize the likelihood function for parameter estimation, ensuring the model's adaptability to different competitive contexts without incurring heavy computational costs.
  4. Extensions for Complex Scenarios: The paper explores several augmentations of the basic SSM, such as multiplayer competitions using the Plackett-Luce model, and matches incorporating detailed scoring, handled via bivariate Poisson processes.

Numerical Results and Model Evaluation

The experimental section offers compelling evidence on the merits of using state-space models over traditional techniques through multiple domain tests, including Tennis, Football, and Chess. The models deliver precise predictive outcomes, verified against baseline methods, notably excelling in managing the skill dynamics in situations with high variability (e.g., unexpected match results or the impact of single events).

  1. Predictive Performance: By employing techniques such as logarithmic scoring, the paper illustrates the model's aptitude in forecasting match outcomes, demonstrating superior performance against conventional models such as Elo and Glicko.
  2. Smoothing versus Filtering: The research highlights the benefits of smoothing in providing a more stable estimate of players' historical skills, which is crucial for retrospective performance appraisals and strategic planning.

Implications and Future Directions

The framework drawn out in this paper has significant implications across both theoretical development and practical utilization in various domains requiring dynamic skill assessment. Notably, it facilitates:

  • More nuanced talent scouting and player progression analysis in sports.
  • Refined matchmaking algorithms in competitive gaming, where skill levels can change rapidly.
  • Flexible adaptation to diverse domains, including educational settings where skill acquisition over time is of interest.

Furthermore, the research calls for further exploration into enhancing computational efficiencies, particularly in high-dimensional settings or under constraints where real-time analysis is essential. Potential future endeavors could look into deeper linkages with online learning algorithms for exploring real-time parameter updates, thus widening the practical applicability of these models.

In conclusion, this contribution to the area of skill rating modeling offers significant progress in methodological sophistication, computational strategies, and the potential for cross-domain applications, thereby opening new avenues for future research and development.

Github Logo Streamline Icon: https://streamlinehq.com