Puzzle game: Prediction and Classification of Wordle Solution Words (2403.19433v3)
Abstract: In MCM/ICM 2023, we proposed a new result prediction model for the popular game Wordle launched by The New York Times. We first preprocessed the raw data and then established a prediction model based on ARIMA to predict the number of report results as of March 1, 2023. We selected word usage frequency, word information entropy, and the number of repeated letters contained in the word as the attributes of the word, and conducted a correlation analysis between these three attributes and the percentage of seven attempts. We also established a regression model based on the XGBoost algorithm, predicted the distribution of reported results, and predicted the correlation percentage of "EERIE". In addition, we also constructed a word classification model that classified words into "simple", "moderate", and "difficult", and explored the relationship between the three attributes and the classification results. Finally, we calculated the percentage of players in the dataset who needed 3 or more attempts for each word. The appendix provides relevant information and problems to be solved for the mathematical modeling competition.
- TAMP de Leeuw. What language can tell us about the elderly and their behaviour: An analysis of three language features subject to age-related change. B.S. thesis, 2017.
- Paul Newbold. Arima model building and the time series analysis approach to forecasting. Journal of forecasting, 2(1):23–35, 1983.
- Fred L Ramsey. Characterization of the partial autocorrelation function. The Annals of Statistics, pages 1296–1301, 1974.
- Word associations: Network and semantic properties. Behavior research methods, 40(1):213–231, 2008.
- Claude E Shannon. Prediction and entropy of printed english. Bell system technical journal, 30(1):50–64, 1951.
- Entropy, transinformation and word distribution of information-carrying sequences. International Journal of Bifurcation and Chaos, 5(01):51–61, 1995.
- Word frequency and entropy of symbolic sequences: a dynamical perspective. Chaos, Solitons & Fractals, 2(6):635–650, 1992.
- Experimenting xgboost algorithm for prediction and classification of different datasets. International Journal of Control Theory and Applications, 9(40):651–662, 2016.
- Unsupervised k-means clustering algorithm. IEEE access, 8:80716–80727, 2020.
- Yan-Yan Song and LU Ying. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2):130, 2015.
- Distance measures for effective clustering of arima time-series. In Proceedings 2001 IEEE international conference on data mining, pages 273–280. IEEE, 2001.
- The New York Times. Wordle logo. https://nytco-assets.nytimes.com/2022/08/cropped-Screen-Shot-2022-08-24-at-8.49.39-AM.png. Accessed: 2022-12-13.
- The New York Times. Wordle-the new york times, 2022. Accessed on December 13, 2022.
- The New York Times. Wordle-the new york times. The New York Times, 2022.
- Wordle stats. Twitter, July 2022. Accessed: 2022-07-20.