Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser (2402.19108v1)

Published 29 Feb 2024 in cs.CV

Abstract: In this work, we present DeepEraser, an effective deep network for generic text removal. DeepEraser utilizes a recurrent architecture that erases the text in an image via iterative operations. Our idea comes from the process of erasing pencil script, where the text area designated for removal is subject to continuous monitoring and the text is attenuated progressively, ensuring a thorough and clean erasure. Technically, at each iteration, an innovative erasing module is deployed, which not only explicitly aggregates the previous erasing progress but also mines additional semantic context to erase the target text. Through iterative refinements, the text regions are progressively replaced with more appropriate content and finally converge to a relatively accurate status. Furthermore, a custom mask generation strategy is introduced to improve the capability of DeepEraser for adaptive text removal, as opposed to indiscriminately removing all the text in an image. Our DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner. To verify its effectiveness, extensive experiments are conducted on several prevalent benchmarks, including SCUT-Syn, SCUT-EnsText, and Oxford Synthetic text dataset. The quantitative and qualitative results demonstrate the effectiveness of our DeepEraser over the state-of-the-art methods, as well as its strong generalization ability in custom mask text removal. The codes and pre-trained models are available at https://github.com/fh2019ustc/DeepEraser

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. R. Bojorque and F. Pesántez-Avilés, “Academic quality management system audit using artificial intelligence techniques,” in Advances in Artificial Intelligence, Software and Systems Engineering, 2020, pp. 275–283.
  2. L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding, and X. Bai, “Editing text in the wild,” in Proceedings of the ACM International Conference on Multimedia, 2019, pp. 1500–1508.
  3. Q. Yang, J. Huang, and W. Lin, “SwapText: Image based texts transfer in scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 700–14 709.
  4. P. Krishnan, R. Kovvuri, G. Pang, B. Vassilev, and T. Hassner, “TextStyleBrush: Transfer of text aesthetics from a single example,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  5. W. Shimoda, D. Haraguchi, S. Uchida, and K. Yamaguchi, “De-rendering stylized texts,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 1076–1085.
  6. C. Aker, O. Tursun, and S. Kalkan, “Analyzing deep features for trademark retrieval,” in Proceedings of the IEEE Signal Processing and Communications Applications Conference, 2017, pp. 1–4.
  7. Y. Gao, M. Shi, D. Tao, and C. Xu, “Database saliency for fast image retrieval,” IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 359–369, 2015.
  8. J. Dong, X. Li, and D. Xu, “Cross-media similarity evaluation for web image retrieval in the wild,” IEEE Transactions on Multimedia, vol. 20, no. 9, pp. 2371–2384, 2018.
  9. M. Petter, V. Fragoso, M. Turk, and C. Baur, “Automatic text detection for mobile augmented reality translation,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011, pp. 48–55.
  10. R. F. J. Rose and G. Bhuvaneswari, “Word recognition incorporating augmented reality for linguistic e-conversion,” in Proceedings of the IEEE International Conference on Electrical, Electronics, and Optimization Techniques, 2016, pp. 2106–2109.
  11. A. A. Syahidi, H. Tolle, A. A. Supianto, and K. Arai, “Bandoar: real-time text based detection system using augmented reality for media translator banjar language to indonesian with smartphone,” in Proceedings of the IEEE International Conference on Engineering Technologies and Applied Sciences, 2018, pp. 1–6.
  12. J. Zdenek and H. Nakayama, “Erasing scene text with weak supervision,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 2238–2246.
  13. S. Zhang, Y. Liu, L. Jin, Y. Huang, and S. Lai, “EnsNet: Ensconce text in the wild,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 801–808.
  14. C. Liu, Y. Liu, L. Jin, S. Zhang, C. Luo, and Y. Wang, “EraseNet: End-to-end text removal in the wild,” IEEE Transactions on Image Processing, vol. 29, pp. 8760–8775, 2020.
  15. O. Tursun, R. Zeng, S. Denman, S. Sivapalan, S. Sridharan, and C. Fookes, “MTRNet: A generic scene text eraser,” in Proceedings of the International Conference on Document Analysis and Recognition, 2019, pp. 39–44.
  16. O. Tursun, S. Denman, R. Zeng, S. Sivapalan, S. Sridharan, and C. Fookes, “MTRNet++: One-stage mask-based scene text eraser,” Proceedings of the Computer Vision and Image Understanding, vol. 201, p. 103066, 2020.
  17. O. Susladkar, D. Makwana, G. Deshmukh, S. Mittal, R. Singhal et al., “TPFNet: A novel text in-painting transformer for text removal,” arXiv preprint arXiv:2210.14461, 2022.
  18. C. Liu, L. Jin, Y. Liu, C. Luo, B. Chen, F. Guo, and K. Ding, “Don’t forget me: Accurate background recovery for text removal via modeling local-global context,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 409–426.
  19. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  20. Z. Tang, T. Miyazaki, Y. Sugaya, and S. Omachi, “Stroke-based scene text erasing using synthetic data for training,” IEEE Transactions on Image Processing, vol. 30, pp. 9306–9320, 2021.
  21. X. Bian, C. Wang, W. Quan, J. Ye, X. Zhang, and D. M. Yan, “Scene text removal via cascaded text stroke detection and erasing,” Computational Visual Media, vol. 8, pp. 273–287, 2022.
  22. H. Lee and C. Choi, “The surprisingly straightforward scene text removal method with gated attention and region of interest generation: A comprehensive prominent model analysis,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 457–472.
  23. X. Du, Z. Zhou, Y. Zheng, X. Wu, T. Ma, and C. Jin, “Progressive scene text erasing with self-supervision,” Computer Vision and Image Understanding, vol. 233, p. 103712, 2023.
  24. X. Du, Z. Zhou, Y. Zheng, T. Ma, X. Wu, and C. Jin, “Modeling stroke mask for end-to-end text erasing,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2023, pp. 6151–6159.
  25. C. Wang, S. Zhao, L. Zhu, K. Luo, Y. Guo, J. Wang, and S. Liu, “Semi-supervised pixel-level scene text segmentation by mutually guided network,” IEEE Transactions on Image Processing, vol. 30, pp. 8212–8221, 2021.
  26. X. Xu, Z. Zhang, Z. Wang, B. Price, Z. Wang, and H. Shi, “Rethinking text segmentation: A novel dataset and a text-specific refinement approach,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 045–12 055.
  27. Y. Wang, H. Xie, Z. Wang, Y. Qu, and Y. Zhang, “What is the real need for scene text removal? exploring the background integrity and erasure exhaustivity properties,” IEEE Transactions on Image Processing, 2023.
  28. G. Lyu and A. Zhu, “PSSTRNet: Progressive segmentation-guided scene text removal network,” in Proceedings of the IEEE International Conference on Multimedia and Expo, 2022, pp. 1–6.
  29. C. Wolf and J.-M. Jolion, “Object count/area graphs for the evaluation of object detection and segmentation algorithms,” Proceedings of the International Journal of Document Analysis and Recognition, vol. 8, no. 4, pp. 280–296, 2006.
  30. T. Nakamura, A. Zhu, K. Yanai, and S. Uchida, “Scene text eraser,” in Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, 2017, pp. 832–837.
  31. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9365–9374.
  32. M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 474–11 481.
  33. S. Zhang, X. Zhu, C. Yang, H. Wang, and X. C. Yin, “Adaptive boundary proposal network for arbitrary shape text detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 1305–1314.
  34. A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2315–2324.
  35. A. Telea, “An image inpainting technique based on the fast marching method,” Journal of Graphics Tools, vol. 9, no. 1, pp. 23–34, 2004.
  36. M. Khodadadi and A. Behrad, “Text localization, extraction and inpainting in color images,” in Proceedings of the IEEE Iranian Conference on Electrical Engineering, 2012, pp. 1035–1040.
  37. P. D. Wagh and D. Patil, “Text detection and removal from image using inpainting with smoothing,” in Proceedings of the IEEE International Conference on Pervasive Computing, 2015, pp. 1–4.
  38. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
  39. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proceedings of the Neural Information Processing Systems, vol. 30, 2017.
  40. S. Qin, J. Wei, and R. Manduchi, “Automatic semantic content removal by learning to neglect,” arXiv preprint arXiv:1807.07696, 2018.
  41. W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9336–9345.
  42. C. Zheng, T.-J. Cham, and J. Cai, “Pluralistic image completion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1438–1447.
  43. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  44. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014.
  45. Z. Teed and J. Deng, “RAFT: Recurrent all-pairs field transforms for optical flow,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 402–419.
  46. B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.
  47. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proceedings of Asilomar Conference on Signals, Systems & Computers, vol. 2, 2003, pp. 1398–1402.
  48. J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE transactions on multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
  49. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  50. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  51. L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” in Proceedings of the Artificial Intelligence and Machine Learning for Multi-domain Operations Applications, vol. 11006, 2019, pp. 369–386.
  52. S. Peng, W. Jiang, H. Pi, X. Li, H. Bao, and X. Zhou, “Deep snake for real-time instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8533–8542.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hao Feng (83 papers)
  2. Wendi Wang (5 papers)
  3. Shaokai Liu (5 papers)
  4. Jiajun Deng (75 papers)
  5. Wengang Zhou (153 papers)
  6. Houqiang Li (236 papers)
Citations (2)