- The paper explores using data science and regression-based machine learning to predict player performance and select optimal fantasy cricket teams for platforms like Dream11.
- It identifies the Extra Trees Regressor model as highly effective for predicting player Dream11 scores, achieving R2 scores of 0.99 for batsmen and 0.97 for bowlers.
- The research proposes a team selection strategy combining Knapsack and Greedy Algorithms to maximize predicted scores while adhering to fantasy platform constraints like credit limits.
The paper provides a comprehensive exploration into using data science and predictive analytics for selecting optimal fantasy cricket teams on the Dream-11 platform. It employs ML techniques, specifically regression techniques, to predict player performance in various cricket formats, including One-Day Internationals (ODI), Indian Premier League (IPL), and T20 matches. The paper focuses on creating an analytical model that leverages historical player performance, game conditions, and other relevant metrics to maximize the statistical probability of winning in fantasy sports.
Key Aspects of the Methodology:
- Data Collection and Preparation:
- The authors utilize cricket match data from Cricsheet.org, consisting of 3100 YAML files across three formats (ODI, IPL, T20). They convert these files to CSV format for analysis.
- Features such as runs, strike rates, wickets, and economic rates are extracted and transformed into a format suitable for ML models.
- Feature engineering is performed to generate additional insights like cumulative strike rate, moving averages, and Dream-11 scores for players.
- Machine Learning Models:
- A crucial part of the paper compared classification models from prior work with regression models for predicting player performance.
- The PyCaret library, an automated ML tool, was used extensively to find the best regressor model, resulting in the selection of the Extra Trees Regressor (ETR) model due to its strong performance in predicting Dream-11 scores with an R2 score of 0.99 for batsmen and 0.97 for bowlers.
- Overfitting was checked against the model use, confirming robustness without overfitting despite high R2 values.
- Predictive Analytics:
- Inputs for the predictive model include aspects such as player name, match format, team details, and venue.
- The model accommodates various user inputs, fetching similar historical data, and transforms it into matrices fed into the ETR model to generate performance predictions.
- Team Selection Strategy:
- To select a fantasy team, the authors deploy a combination of the Knapsack and Greedy Algorithms to comply with Dream-11 constraints (e.g., credit limits, selection limits per team).
- The goal is to maximize expected Dream-11 scores while adhering to the credit budget cap.
- Visualization and Data Insights:
- The paper highlighted the use of Plotly for generating various interactive plots to provide visual insights into player and team performance, which assisted in understanding strengths and weaknesses.
Conclusions and Implications:
- The paper underscores the superiority of regression models over classification models in predicting player performance for fantasy cricket applications.
- By employing data engineering, feature transformation, and ML pipelines, the research demonstrates an effective methodology for analyzing cricket data and optimizing fantasy cricket team selections.
- Future work could involve automating data update pipelines from Cricsheet.org to maintain real-time relevance of data, potentially enhancing the model’s predictive capabilities.
This research offers a tailored approach to fantasy sports analytics, with a strong foundation in ML algorithms and data-driven decision-making processes that can inform enthusiasts and analysts in building competitive fantasy teams.