Document Image Cleaning using Budget-Aware Black-Box Approximation (2306.13236v1)
Abstract: Recent work has shown that by approximating the behaviour of a non-differentiable black-box function using a neural network, the black-box can be integrated into a differentiable training pipeline for end-to-end training. This methodology is termed "differentiable bypass,'' and a successful application of this method involves training a document preprocessor to improve the performance of a black-box OCR engine. However, a good approximation of an OCR engine requires querying it for all samples throughout the training process, which can be computationally and financially expensive. Several zeroth-order optimization (ZO) algorithms have been proposed in black-box attack literature to find adversarial examples for a black-box model by computing its gradient in a query-efficient manner. However, the query complexity and convergence rate of such algorithms makes them infeasible for our problem. In this work, we propose two sample selection algorithms to train an OCR preprocessor with less than 10% of the original system's OCR engine queries, resulting in more than 60% reduction of the total training time without significant loss of accuracy. We also show an improvement of 4% in the word-level accuracy of a commercial OCR engine with only 2.5% of the total queries and a 32x reduction in monetary cost. Further, we propose a simple ranking technique to prune 30% of the document images from the training dataset without affecting the system's performance.
- Neural network gradient-based learning of black-box function interfaces, arXiv preprint arXiv:1901.03995 (2019).
- N. M. Nguyen, N. Ray, End-to-end learning of convolutional neural net and dynamic programming for left ventricle segmentation, in: Medical Imaging with Deep Learning, PMLR, 2020, pp. 555–569.
- Unknown-box approximation to improve optical character recognition performance, in: International Conference on Document Analysis and Recognition, Springer, 2021, pp. 481–496.
- Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization, Advances in Neural Information Processing Systems 32 (2019).
- S. Ghadimi, G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM Journal on Optimization 23 (2013) 2341–2368.
- signsgd via zeroth-order oracle, in: International Conference on Learning Representations, 2018.
- Hopskipjumpattack: A query-efficient decision-based attack, in: 2020 ieee symposium on security and privacy (sp), IEEE, 2020, pp. 1277–1294.
- Sign-opt: A query-efficient hard-label adversarial attack, arXiv preprint arXiv:1909.10773 (2019).
- Black-box adversarial attacks with limited queries and information, in: International Conference on Machine Learning, PMLR, 2018, pp. 2137–2146.
- R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning 8 (1992) 229–256.
- The cost of training nlp models: A concise overview, arXiv preprint arXiv:2004.08900 (2020).
- Energy and policy considerations for deep learning in nlp, arXiv preprint arXiv:1906.02243 (2019).
- Grad-match: Gradient matching based data subset selection for efficient deep model training, in: International Conference on Machine Learning, PMLR, 2021, pp. 5464–5474.
- Coresets for data-efficient training of machine learning models, in: International Conference on Machine Learning, PMLR, 2020, pp. 6950–6960.
- Deep learning on a data diet: Finding important examples early in training, Advances in Neural Information Processing Systems 34 (2021) 20596–20607.
- Accelerating deep learning by focusing on the biggest losers, arXiv preprint arXiv:1910.00762 (2019).
- V. I. Levenshtein, et al., Binary codes capable of correcting deletions, insertions, and reversals, in: Soviet physics doklady, volume 10, Soviet Union, 1966, pp. 707–710.
- Hyperparameter optimization in black-box image processing using differentiable proxies, ACM Trans. Graph. 38 (2019). URL: https://doi.org/10.1145/3306346.3322996. doi:10.1145/3306346.3322996.
- S. He, L. Schomaker, Deepotsu: Document enhancement and binarization using iterative deep learning, Pattern recognition 91 (2019) 379–390.
- J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern recognition 33 (2000) 225–236.
- A double-threshold image binarization method based on edge detector, Pattern recognition 41 (2008) 1254–1267.
- Machine reading of camera-held low quality text images: an ica-based image enhancement approach for improving ocr accuracy, in: 2008 19th International Conference on Pattern Recognition, IEEE, 2008, pp. 1–4.
- A. Lat, C. Jawahar, Enhancing ocr accuracy with super resolution, in: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, 2018, pp. 3162–3167.
- X. Peng, C. Wang, Building super-resolution image generator for ocr accuracy improvement, in: International Workshop on Document Analysis Systems, Springer, 2020, pp. 145–160.
- J. Fernandez, D. Downey, Sampling informative training data for rnn language models, in: Proceedings of ACL 2018, Student Research Workshop, 2018, pp. 9–13.
- A. Katharopoulos, F. Fleuret, Not all samples are created equal: Deep learning with importance sampling, in: Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 2525–2534. URL: https://proceedings.mlr.press/v80/katharopoulos18a.html.
- Curriculum learning, in: Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
- T. Zhou, J. Bilmes, Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity, in: International Conference on Learning Representations, 2018.
- Curriculum learning by dynamic instance hardness, Advances in Neural Information Processing Systems 33 (2020) 8602–8613.
- A. Krause, D. Golovin, Submodular function maximization., Tractability 3 (2014) 71–104.
- Selection via proxy: Efficient data selection for deep learning, 2019.
- B. Settles, Active learning literature survey, Machine Learning (2009).
- Submodularity in data subset selection and active learning, in: International conference on machine learning, PMLR, 2015, pp. 1954–1963.
- Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, Advances in neural information processing systems 32 (2019).
- Find it! fraud detection contest report, in: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE, 2018, pp. 13–18.
- Icdar2019 competition on scanned receipt ocr and information extraction, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 1516–1520.
- Cord: a consolidated receipt dataset for post-ocr parsing, in: Workshop on Document Intelligence at NeurIPS 2019, 2019.
- Synthetic data and artificial neural networks for natural scene text recognition, arXiv preprint arXiv:1406.2227 (2014).
- U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
- An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE transactions on pattern analysis and machine intelligence 39 (2016) 2298–2304.
- D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in: Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
- Optuna: A next-generation hyperparameter optimization framework, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.
- Ganesh Tata (3 papers)
- Katyani Singh (1 paper)
- Eric Van Oeveren (2 papers)
- Nilanjan Ray (40 papers)