Scene Text Recognition Models Explainability Using Local Features (2310.09549v1)
Abstract: Explainable AI (XAI) is the study on how humans can be able to understand the cause of a model's prediction. In this work, the problem of interest is Scene Text Recognition (STR) Explainability, using XAI to understand the cause of an STR model's prediction. Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI methods. In this study, we specifically work on data explainability frameworks, called attribution-based methods, that explain the important parts of an input data in deep learning models. However, integrating them into STR produces inconsistent and ineffective explanations, because they only explain the model in the global context. To solve this problem, we propose a new method, STRExp, to take into consideration the local explanations, i.e. the individual character prediction explanations. This is then benchmarked across different attribution-based methods on different STR datasets and evaluated across different STR models.
- “What is wrong with scene text recognition model comparisons? dataset and model analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4715–4723.
- “Towards accurate scene text recognition with semantic reasoning networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12113–12122.
- “Text recognition in the wild: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021.
- “Scan: Sliding convolutional attention network for scene text recognition,” arXiv preprint arXiv:1806.00578, 2018.
- “Aster: An attentional scene text recognizer with flexible rectification,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 9, pp. 2035–2048, 2018.
- “Using human psychophysics to evaluate generalization in scene text recognition models,” arXiv preprint arXiv:2007.00083, 2020.
- Tim Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial intelligence, vol. 267, pp. 1–38, 2019.
- “A survey on the explainability of supervised machine learning,” Journal of Artificial Intelligence Research, vol. 70, pp. 245–317, 2021.
- “Evaluating the quality of machine learning explanations: A survey on methods and metrics,” Electronics, vol. 10, no. 5, pp. 593, 2021.
- “Explaining explanations: An overview of interpretability of machine learning,” in 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 2018, pp. 80–89.
- Christoph Molnar, “A guide for making black box models explainable,” URL: https://christophm. github. io/interpretable-ml-book, 2018.
- “Sequential interpretability: Methods, applications, and future direction for understanding deep learning models in the context of sequential data,” arXiv preprint arXiv:2004.12524, 2020.
- “Binary relevance efficacy for multilabel classification,” Progress in Artificial Intelligence, vol. 1, no. 4, pp. 303–313, 2012.
- “Captum: A unified and generic model interpretability library for pytorch,” arXiv preprint arXiv:2009.07896, 2020.
- “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
- “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
- “Methods for interpreting and understanding deep neural networks,” Digital Signal Processing, vol. 73, pp. 1–15, 2018.
- Rowel Atienza, “Vision transformer for fast and efficient scene text recognition,” in International Conference on Document Analysis and Recognition. Springer, 2021, pp. 319–334.
- “Scene text recognition with permuted autoregressive sequence models,” in European Conference on Computer Vision. Springer, 2022, pp. 178–196.
- “Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7098–7107.
- “Multi-modal text recognition networks: Interactive enhancements between visual and semantic features,” in European Conference on Computer Vision. Springer, 2022, pp. 446–463.