Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Pixels to Titles: Video Game Identification by Screenshots using Convolutional Neural Networks (2311.15963v3)

Published 27 Nov 2023 in cs.CV and cs.NE

Abstract: This paper investigates video game identification through single screenshots, utilizing ten convolutional neural network (CNN) architectures (VGG16, ResNet50, ResNet152, MobileNet, DenseNet169, DenseNet201, EfficientNetB0, EfficientNetB2, EfficientNetB3, and EfficientNetV2S) and three transformers architectures (ViT-B16, ViT-L32, and SwinT) across 22 home console systems, spanning from Atari 2600 to PlayStation 5, totalling 8,796 games and 170,881 screenshots. Except for VGG16, all CNNs outperformed the transformers in this task. Using ImageNet pre-trained weights as initial weights, EfficientNetV2S achieves the highest average accuracy (77.44%) and the highest accuracy in 16 of the 22 systems. DenseNet201 is the best in four systems and EfficientNetB3 is the best in the remaining two systems. Employing alternative initial weights fine-tuned in an arcade screenshots dataset boosts accuracy for EfficientNet architectures, with the EfficientNetV2S reaching a peak accuracy of 77.63% and demonstrating reduced convergence epochs from 26.9 to 24.5 on average. Overall, the combination of optimal architecture and weights attains 78.79% accuracy, primarily led by EfficientNetV2S in 15 systems. These findings underscore the efficacy of CNNs in video game identification through screenshots.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Gameplay genre video classification by using mid-level video representation, in: 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 188–194. doi:10.1109/SIBGRAPI.2016.034.
  2. Convolutional neural networks and ensembles for visually impaired aid, in: Gervasi, O., Murgante, B., Taniar, D., Apduhan, B.O., Braga, A.C., Garau, C., Stratigea, A. (Eds.), Computational Science and Its Applications – ICCSA 2023, Springer Nature Switzerland, Cham. pp. 520–534.
  3. Covid-19 detection on chest x-ray images: A comparison of cnn architectures and ensembles. Expert Systems with Applications 204, 117549. URL: https://www.sciencedirect.com/science/article/pii/S0957417422008673, doi:https://doi.org/10.1016/j.eswa.2022.117549.
  4. Deep learning. MIT press.
  5. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construction and Building Materials 157, 322 – 330. URL: http://www.sciencedirect.com/science/article/pii/S0950061817319335, doi:https://doi.org/10.1016/j.conbuildmat.2017.09.110.
  6. Automated genre classification for gaming videos, in: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. doi:10.1109/MMSP48831.2020.9287122.
  7. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 .
  8. Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  9. Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. Journal of Medical Imaging 3, 1 – 5. URL: https://doi.org/10.1117/1.JMI.3.3.034501, doi:10.1117/1.JMI.3.3.034501.
  10. Deep learning for video game genre classification. Multimedia Tools and Applications , 1–15.
  11. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
  12. Imagenet classification with deep convolutional neural networks, in: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (Eds.), Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105. URL: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
  13. Deep learning. nature 521, 436.
  14. On using nearly-independent feature families for high precision and confidence, in: Fourth Asian Machine Learning Conference, pp. 269–284. URL: http://jmlr.csail.mit.edu/proceedings/papers/v25/.
  15. Moby Games, 2023. Moby games. https://www.mobygames.com/. Online; acessed from August to October, 2023.
  16. Learning and transferring mid-level image representations using convolutional neural networks, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  17. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 211–252. doi:10.1007/s11263-015-0816-y.
  18. Deep learning in neural networks: An overview. Neural networks 61, 85–117.
  19. Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging 35, 1285–1298. doi:10.1109/TMI.2016.2528162.
  20. Game genre classification from icon and screenshot images using convolutional neural networks, in: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference, Association for Computing Machinery, New York, NY, USA. p. 51–58. URL: https://doi.org/10.1145/3375959.3375988, doi:10.1145/3375959.3375988.
  21. Development of convolutional neural networks for analyzing game icon and screenshot images. International Journal of Pattern Recognition and Artificial Intelligence 36, 2254023. doi:10.1142/S0218001422540234.
  22. EfficientNet: Rethinking model scaling for convolutional neural networks, in: Chaudhuri, K., Salakhutdinov, R. (Eds.), Proceedings of the 36th International Conference on Machine Learning, PMLR. pp. 6105–6114. URL: https://proceedings.mlr.press/v97/tan19a.html.
  23. Wikipedia contributors, 2023. List of best-selling game consoles — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Listofbest-sellinggameconsoles&oldid=1178084645. [Online; accessed 10-October-2023].
  24. A classification of video games based on game characteristics linked to video coding complexity, in: 2018 16th Annual Workshop on Network and Systems Support for Games (NetGames), pp. 1–6. doi:10.1109/NetGames.2018.8463434.

Summary

  • The paper introduces a novel method using CNNs to extract features and identify video game titles from single screenshots.
  • It evaluates five architectures, with EfficientNetB3 achieving the highest accuracy across 22 systems.
  • Transfer learning from pre-trained weights significantly improved accuracy and reduced training epochs, highlighting practical benefits.

Introduction

The field of automated video game identification has gained traction due to its technical challenges and practical applications across various sectors within the gaming industry. It allows platforms to generate metadata from user-uploaded screenshots, improves cataloging efficiency, and enhances viewers’ experience on streaming platforms. Traditional game classification methods have mostly focused on genre, but this research shifts the focus to the identification of video game titles from single screenshots using Convolutional Neural Networks (CNNs).

Methodology

The research employs five CNN architectures: MobileNet, DenseNet, EfficientNetB0, EfficientNetB2, and EfficientNetB3, training them on 170,881 screenshots from 8,796 games across 22 home console systems. The hypothesis is that CNNs can autonomously extract image features to identify game titles without additional inputs. The dataset, sourced from the Moby Games Database, spans from the Atari 2600 to the PlayStation 5. The paper uses transfer learning, starting with pretrained weights from the ImageNet dataset, and for some architectures, it compares these results with weights from another dataset of arcade game screenshots.

Results

EfficientNetB3 architecture achieves the highest average accuracy of 74.51%, slightly outperforming DenseNet169 which excels in 14 out of 22 systems. Furthermore, when initial weights from an arcade game dataset are used, there's an increase in accuracy and a reduction in training epochs for EfficientNetB2 and EfficientNetB3, with the latter peaking at 76.36% accuracy. The optimal combination of architecture and weights leads to an overall accuracy of 77.67%, with EfficientNetB3 leading in 19 out of 22 systems.

Conclusions

The efficacy of CNNs in video game identification from screenshots is affirmed by this paper. While larger networks provide better accuracy, future research could investigate even larger CNN architectures and ensembles to enhance results. Transferring weights from related tasks has been shown to improve accuracy, highlighting the potential for CNNs in other screenshot-based applications within the gaming sector. This advancement in AI tech sets a strong foundation for automation in game libraries and streaming platforms, adding innovation and efficiency to gaming and research endeavors.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets