Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System (2402.17204v3)

Published 27 Feb 2024 in cs.CV

Abstract: This research addresses a critical challenge in the field of generative models, particularly in the generation and evaluation of synthetic images. Given the inherent complexity of generative models and the absence of a standardized procedure for their comparison, our study introduces a pioneering algorithm to objectively assess the realism of synthetic images. This approach significantly enhances the evaluation methodology by refining the Fr\'echet Inception Distance (FID) score, allowing for a more precise and subjective assessment of image quality. Our algorithm is particularly tailored to address the challenges in generating and evaluating realistic images of Arabic handwritten digits, a task that has traditionally been near-impossible due to the subjective nature of realism in image generation. By providing a systematic and objective framework, our method not only enables the comparison of different generative models but also paves the way for improvements in their design and output. This breakthrough in evaluation and comparison is crucial for advancing the field of OCR, especially for scripts that present unique complexities, and sets a new standard in the generation and assessment of high-quality synthetic images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Kenneth R Castleman. Digital image processing. Prentice Hall Press, 1996.
  2. Wayne Niblack. An introduction to digital image processing. Strandberg Publishing Company, 1985.
  3. Anil K Jain. Fundamentals of digital image processing. Prentice-Hall, Inc., 1989.
  4. Bernd Jähne. Digital image processing. Springer Science & Business Media, 2005.
  5. A review paper: noise models in digital image processing. arXiv preprint arXiv:1505.03489, 2015.
  6. Handwritten Digit Recognition Using Convolutional Neural Networks. Article in International Journal of Innovative Research in Computer and Communication Engineering, 3297(2), 2016.
  7. A Two-Stage System for Arabic Handwritten Digit Recognition Tested on a New Large Database. In Artificial intelligence and pattern recognition, pages 237–242, 2007.
  8. Optical character recognition. John Wiley & Sons, Inc., 1999.
  9. Optical character recognition. International journal of recent technology and engineering (IJRTE), 2(1):72–75, 2013.
  10. A survey on optical character recognition system. arXiv preprint arXiv:1710.05703, 2017.
  11. Optical character recognition systems. Springer, 2017.
  12. Sukhpreet Singh. Optical character recognition techniques: a survey. Journal of emerging Trends in Computing and information Sciences, 4(6), 2013.
  13. OPTICAL CHARACTER RECOGNITION TECHNIQUE ALGORITHMS. Journal of Theoretical & Applied Information Technology, 83(2), 2016.
  14. Optical character recognition: an overview and an insight. In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pages 1361–1365. IEEE, 2014.
  15. Efficient, lexicon-free OCR using deep learning. In 2019 international conference on document analysis and recognition (ICDAR), pages 295–301. IEEE, 2019.
  16. Generation method of synthetic training data for mobile OCR system. In Tenth international conference on machine vision (ICMV 2017), volume 10696, pages 640–646. SPIE, 2018.
  17. Data augmentation via adversarial networks for optical character recognition/conference submissions. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 184–189. IEEE, 2019.
  18. Historical review of OCR research and development. Proceedings of the IEEE, 80(7):1029–1058, 1992.
  19. A generative probabilistic OCR model for NLP applications. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 134–141, 2003.
  20. OCR error correction using character correction and feature-based word classification. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pages 198–203. IEEE, 2016.
  21. Improving the accuracy of Tesseract 4.0 OCR engine using convolution-based preprocessing. Symmetry, 12(5):715, 2020.
  22. Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model. Multimedia systems, 29(1):59–71, 2023.
  23. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics (ToG), 35(6):1–11, 2016.
  24. Evaluation of model-based retrieval effectiveness with OCR text. ACM Transactions on Information Systems (TOIS), 14(1):64–93, 1996.
  25. Evaluation of pattern classifiers for fingerprint and OCR applications. Pattern Recognition, 27(4):485–501, 1994.
  26. Real-time crash prediction on expressways using deep generative models. Transportation research part C: emerging technologies, 117:102697, 2020.
  27. A generative framework for real time object detection and classification. Computer Vision and Image Understanding, 98(1):182–210, 2005.
  28. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014.
  29. Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7327–7347, 2021.
  30. An Introduction to Variational Autoencoders, volume 12. Now Publishers, Inc., 2019.
  31. Dynamical variational autoencoders: A comprehensive review. arXiv preprint arXiv:2008.12595, 2020.
  32. Learning Structured Output Representation using Deep Conditional Generative Models. Advances in Neural Information Processing Systems, 28, 2015.
  33. Conditional Generative Adversarial Nets. Technical report, 11 2014.
  34. Molecular generative model based on conditional variational autoencoder for de novo molecular design. Journal of cheminformatics, 10(1):1–9, 2018.
  35. Balancing reconstruction error and kullback-leibler divergence in variational autoencoders. IEEE Access, 8:199440–199448, 2020.
  36. Generative adversarial network: An overview of theory and applications. International Journal of Information Management Data Insights, 1(1):100004, 2021.
  37. Attri-VAE: Attribute-based interpretable representations of medical images with variational autoencoders. Computerized Medical Imaging and Graphics, 104:102158, 2023.
  38. Handbook of document image processing and recognition. Springer Publishing Company, Incorporated, 2014.
  39. Multi-stage variational auto-encoders for coarse-to-fine image generation. In Proceedings of the 2019 SIAM International Conference on Data Mining, pages 630–638. SIAM, 2019.
  40. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging, 9(4):81, 2023.
  41. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR), 2013.
  42. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision, 2017-Octob, 3 2017.
  43. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  44. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv preprint arXiv:1809.11096, 9 2018.
  45. A Note on the Inception Score. arXiv preprint arXiv:1801.01973, 2018.
  46. Kullback–Leibler divergence measure for multivariate skew-normal distributions. Entropy, 14(9):1606–1626, 2012.
  47. The jensen-shannon divergence. Journal of the Franklin Institute, 334(2):307–318, 1997.
  48. S S Vallender. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability & Its Applications, 18(4):784–786, 1974.
  49. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14):e49–e57, 2006.
  50. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, 2017-Decem(Nips):6627–6638, 2017.
  51. Improving Bangla OCR output through correction algorithms. In 2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), pages 338–343. IEEE, 2016.
  52. Oberiri Destiny Apuke. Quantitative research methods: A synopsis approach. Kuwait Chapter of Arabian Journal of Business and Management Review, 33(5471):1–8, 2017.
  53. CNN for handwritten arabic digits recognition based on LeNet-5. Advances in Intelligent Systems and Computing, 533:565–575, 2017.
  54. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
  55. Synthetic data augmentation using GAN for improved liver lesion classification. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 289–293. IEEE, 2018.
  56. Shun-ichi Amari. Backpropagation and stochastic gradient descent method. Neurocomputing, 5(4-5):185–196, 1993.
  57. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE access, 8:4806–4813, 2019.
  58. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In International Conference on Machine Learning, pages 5530–5540. PMLR, 2021.
  59. Deep learning with tensorflow: A review. Journal of Educational and Behavioral Statistics, 45(2):227–248, 2020.
  60. Zijun Zhang. Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS), pages 1–2. Ieee, 2018.
  61. An improvement of the convergence proof of the ADAM-Optimizer. arXiv preprint arXiv:1804.10587, 2018.
  62. Structured VAEs: Composing probabilistic graphical models and variational autoencoders. arXiv preprint arXiv:1603.06277, 2:2016, 2016.
  63. Ladder variational autoencoders. Advances in neural information processing systems, 29, 2016.
  64. Improved variational autoencoders for text modeling using dilated convolutions. In International conference on machine learning, pages 3881–3890. PMLR, 2017.
  65. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2014.
  66. CNN based optical character recognition and applications. In 2021 6th International conference on inventive computation technologies (ICICT), pages 666–672. IEEE, 2021.
  67. COMPUTING THE FRÉCHET DISTANCE BETWEEN TWO POLYGONAL CURVES. International Journal of Computational Geometry {\textbackslash}& Applications, 1995.
  68. The Fréchet distance revisited and extended. ACM Transactions on Algorithms (TALG), 10(1):1–22, 2014.
  69. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  70. Going deeper with convolutions. IEEE conference on computer vision and pattern recognition., 9 2015.
  71. Feature visualization. Distill, 2(11):e7, 2017.
  72. Feedback in low vs. high fidelity visuals for game prototypes. In 2012 Second International Workshop on Games and Software Engineering: Realizing User Engagement with Game Engineering Techniques (GAS), pages 42–47. IEEE, 2012.
  73. Internalized biases in fréchet inception distance. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.
  74. Frechet inception distance (fid) for evaluating gans. China University of Mining Technology Beijing Graduate School: Beijing, China, 2021.
  75. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25:1097–1105, 2012.
  76. On early stopping in gradient descent learning. Constructive Approximation, 26:289–315, 2007.
  77. A look inside the black box: Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters. The Journal of Chemical Physics, 153(2), 2020.
  78. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
  79. Discriminative spatial saliency for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3506–3513. IEEE, 2012.
  80. A comprehensive study of image classification model sensitivity to foregrounds, backgrounds, and visual attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19087–19097, 2022.
  81. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
  82. Learning saliency maps for object categorization. In International Workshop on The Representation and Use of Prior Knowledge in Vision (in ECCV’06), 2006.
  83. Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence, 3(6):e200267, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.