Papers
Topics
Authors
Recent
Search
2000 character limit reached

A State-Space Perspective on Modelling and Inference for Online Skill Rating

Published 4 Aug 2023 in stat.AP and stat.ML | (2308.02414v3)

Abstract: We summarise popular methods used for skill rating in competitive sports, along with their inferential paradigms and introduce new approaches based on sequential Monte Carlo and discrete hidden Markov models. We advocate for a state-space model perspective, wherein players' skills are represented as time-varying, and match results serve as observed quantities. We explore the steps to construct the model and the three stages of inference: filtering, smoothing and parameter estimation. We examine the challenges of scaling up to numerous players and matches, highlighting the main approximations and reductions which facilitate statistical and computational efficiency. We additionally compare approaches in a realistic experimental pipeline that can be easily reproduced and extended with our open-source Python package, https://github.com/SamDuffield/abile.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 72(3), 269–342.
  2. Bradley, R. A. and M. E. Terry (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika 39(3/4), 324–345.
  3. Cappé, O. (2011). Online EM algorithm for hidden Markov models. Journal of Computational and Graphical Statistics 20(3), 728–749.
  4. Introduction to Sequential Monte Carlo. Springer International Publishing.
  5. TrueSkill Through Time: Revisiting the History of Chess. In J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.), Advances in Neural Information Processing Systems, Volume 20. Curran Associates, Inc.
  6. On backward smoothing algorithms. The Annals of Statistics 51(5), 2145 – 2169.
  7. Davidson, R. R. (1970). On Extending the Bradley-Terry model to Accommodate Ties in Paired Comparison Experiments. Journal of the American Statistical Association 65(329), 317–328.
  8. Forward smoothing using sequential Monte Carlo.
  9. Dixon, M. J. and S. G. Coles (1997). Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics) 46(2), 265–280.
  10. Comparison of resampling schemes for particle filtering. In ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., pp.  64–69. IEEE.
  11. Sequential Monte Carlo smoothing for general state space hidden Markov models. The Annals of Applied Probability 21(6), 2109 – 2145.
  12. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and computing 10, 197–208.
  13. Duffield, S. (2024). ghq: Gauss-Hermite quadrature in JAX.
  14. Duffield, S. and S. S. Singh (2022). Online Particle Smoothing With Application to Map-Matching. IEEE Transactions on Signal Processing 70, 497–508.
  15. Elo, A. (1978). The Rating of Chessplayers, Past and Present. Ishi Press.
  16. Evensen, G. (2009). Data assimilation: the ensemble Kalman filter, Volume 2. Springer.
  17. FIDE (2023). International chess federation.
  18. Finke, A. and S. S. Singh (2017). Approximate smoothing and parameter estimation in high-dimensional state-space models. IEEE Transactions on Signal Processing 65(22), 5982–5994.
  19. Bayesian workflow.
  20. Factorial hidden Markov models. In D. Touretzky, M. Mozer, and M. Hasselmo (Eds.), Advances in Neural Information Processing Systems, Volume 8. MIT Press.
  21. Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. Journal of the Royal Statistical Society: Series C (Applied Statistics) 48(3), 377–394.
  22. Monte Carlo smoothing for nonlinear time series. Journal of the American statistical association 99(465), 156–168.
  23. The analysis and forecasting of tennis matches by using a high dimensional dynamic model. Journal of the Royal Statistical Society Series A: Statistics in Society 182(4), 1393–1409.
  24. Bayesian inference for plackett-luce ranking models. In proceedings of the 26th annual international conference on machine learning, pp.  377–384.
  25. TrueSkill™: A Bayesian Skill Rating System. In B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Advances in Neural Information Processing Systems, Volume 19. MIT Press.
  26. Using Elo ratings for match result prediction in association football. International Journal of Forecasting 26(3), 460–470. Sports Forecasting.
  27. Ingram, M. (2021). How to extend Elo: a Bayesian perspective. Journal of Quantitative Analysis in Sports 17(3), 203–219.
  28. Joshy, V. (2024). OpenSkill: A faster asymmetric multi-team, multiplayer rating system. Journal of Open Source Software 9(93), 5901.
  29. Julier, S. J. and J. K. Uhlmann (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE 92(3), 401–422.
  30. On Particle Methods for Parameter Estimation in State-Space Models. Statistical Science 30(3), 328 – 351.
  31. Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 381–393.
  32. Modelling Competitive Sports: Bradley-Terry-Elo Models for Supervised and On-Line Learning of Paired Competition Outcomes.
  33. Kovalchik, S. A. (2016). Searching for the GOAT of tennis win prediction. Journal of Quantitative Analysis in Sports 12(3), 127–138.
  34. Assessing Approximate Inference for Binary Gaussian Process Classification. Journal of machine learning research 6(10).
  35. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis.
  36. Menke, J. E. and T. R. Martinez (2008). A Bradley–Terry artificial neural network model for individual ratings in group competitions. Neural computing and Applications 17, 175–186.
  37. Continuous-time state-space modelling of the hot hand in basketball. AStA Advances in Statistical Analysis 107(1-2), 313–326.
  38. TrueSkill 2: An improved Bayesian skill rating system. Technical Report MSR-TR-2018-8, Microsoft.
  39. Minka, T. P. (2001a). Expectation propagation for approximate Bayesian inference. In J. S. Breese and D. Koller (Eds.), UAI ’01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, University of Washington, Seattle, Washington, USA, August 2-5, 2001, pp.  362–369. Morgan Kaufmann.
  40. Minka, T. P. (2001b). A family of algorithms for approximate Bayesian inference. Ph. D. thesis, Massachusetts Institute of Technology.
  41. Neal, R. M. and G. E. Hinton (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pp.  355–368. Springer.
  42. Ollivier, Y. (2018). Online natural gradient as a kalman filter.
  43. The hot hand in professional darts. Journal of the Royal Statistical Society Series A: Statistics in Society 183(2), 565–580.
  44. Pelánek, R. (2016). Applications of the Elo rating system in adaptive educational systems. Computers and Education 98, 169–179.
  45. Plackett, R. L. (1975). The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24(2), 193–202.
  46. Rebeschini, P. and R. van Handel (2015). Can local particle filters beat the curse of dimensionality? The Annals of Applied Probability 25(5), 2809 – 2866.
  47. Exploiting locality in high-dimensional Factorial hidden Markov models. Journal of Machine Learning Research 23(4), 1–34.
  48. Bayesian filtering and smoothing, Volume 17. Cambridge university press.
  49. Stefani, R. (2011). The methodology of officially recognized international sports rating systems. Journal of Quantitative Analysis in Sports 7(4).
  50. Understanding draws in Elo rating algorithm. Journal of Quantitative Analysis in Sports 16(3), 211–220.
  51. Simplified Kalman filter for on-line rating: one-fits-all approach. Journal of Quantitative Analysis in Sports.
  52. An overview of composite likelihood methods. Statistica Sinica, 5–42.
  53. Pairwise likelihood inference for general state space models. Econometric Reviews 28(1-3), 170–185.
  54. Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of mathematical psychology 44(1), 92–107.
  55. A Bayesian test for the hot hand phenomenon. Journal of Mathematical Psychology 72, 200–209.
  56. Wheatcroft, E. (2021). Forecasting football matches by predicting match statistics. Journal of Sports Analytics 7(2), 77–97.
  57. Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. International Journal of Forecasting 28(2), 532–542.
Citations (2)

Summary

  • The paper introduces a state-space model framework that treats player skills as latent variables evolving over time using Markovian dynamics.
  • It leverages advanced inference techniques like extended Kalman filtering and pairwise updating to efficiently manage high-dimensional sports data.
  • Experiments show the approach outperforms traditional models like Elo in predicting outcomes, especially in scenarios with draws and variable skill volatility.

A State-Space Perspective on Modelling and Inference for Online Skill Rating

Introduction to Skill Rating Models

The paper explores advanced methodologies for skill rating in competitive sports, emphasizing models that account for the time-varying nature of player abilities. This perspective is crucial for accurately capturing the dynamic process of skill evolution based on match outcomes. Traditional systems such as Elo and TrueSkill are referenced, with enhancements proposed through a state-space model approach. This framework allows for modular, scalable, and interpretable models, which can be tailored to the specific needs of various sports and competitive scenarios.

State-Space Model Framework

The key contribution is the introduction of a state-space model (SSM) perspective for skill rating. The SSM setup views player skills as latent variables that evolve over time according to Markovian dynamics. Match results serve as observable data. This structure is flexible, supporting variations in match outcome distributions and dynamic skill trajectories, which can be continuous or discrete. This methodology contrasts with traditional approaches by decoupling model design from inference, empowering users to experiment with different configurations.

Model Specification

The model specifies skills as ordinal quantities within a totally ordered set X\mathcal{X}, evolving in continuous time across matches. Match outcomes are linked probabilistically to these skill levels using potentially complex likelihood functions. Various dynamics are explored, including the discrete-time Markov jump process with a generator matrix, highlighting different modeling strategies dependent on the sport context. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Visualization of the different skill representations for Argentina's 2023 FIFA World Cup triumph. Each y-axis represents the skill-rating scale for the different approaches.

Inference in State-Space Models

Three inference tasks are addressed: filtering, smoothing, and parameter estimation. Filtering provides real-time estimates, smoothing refines past estimates based on all observed data, and estimation tunes model parameters to optimize fit. Computational strategies are explored to make these operations efficient in high-dimension settings typical of sports datasets, leveraging factorial approximations to maintain tractability.

Algorithmic Techniques

For filtering, the paper discusses methods like extended Kalman filtering, which apply Gaussian approximations to complex likelihoods. Pairwise updating schemes harness sparsity in match data, minimizing the computational overhead by focusing updates only on involved players at each event. The smoothing process involves backward recursions to incorporate prediction dynamics, refining historical skill estimates. Figure 2

Figure 2: Log-likelihood grid and parameter estimation for WTA tennis data.

Practical Implementation and Extensions

Several practical extensions of the state-space framework are suggested, encompassing non-Gaussian dynamics, multi-player comparisons, and specific settings like home advantage in football. Furthermore, the availability of an open-source Python package, abile, facilitates the reproduction of results and adaptation of models for diverse applications.

Experiments and Results

Experiments demonstrate the effectiveness of the proposed methodologies on sports datasets, showing superior predictive capabilities over traditional models like Elo in scenarios involving draws and varying skill volatility. Analysis on real-world datasets showcases the utility for retrospective evaluations and strategic decision-making, enhancing model interpretability and predictive accuracy. Figure 3

Figure 3

Figure 3: Extended Kalman filtering and smoothing with for Tottenham's EPL matches from 2011-2023.

Conclusion

The paper presents a compelling case for adopting a state-space approach to skill rating, highlighting its robustness, flexibility, and interpretability compared to traditional methods. Future work may explore broader applications across competitive domains and refine inference techniques to further enhance model performance and scalability.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 139 likes about this paper.