PSGText: Stroke-Guided Scene Text Editing with PSP Module (2310.13366v1)
Abstract: Scene Text Editing (STE) aims to substitute text in an image with new desired text while preserving the background and styles of the original text. However, present techniques present a notable challenge in the generation of edited text images that exhibit a high degree of clarity and legibility. This challenge primarily stems from the inherent diversity found within various text types and the intricate textures of complex backgrounds. To address this challenge, this paper introduces a three-stage framework for transferring texts across text images. Initially, we introduce a text-swapping network that seamlessly substitutes the original text with the desired replacement. Subsequently, we incorporate a background inpainting network into our framework. This specialized network is designed to skillfully reconstruct background images, effectively addressing the voids left after the removal of the original text. This process meticulously preserves visual harmony and coherence in the background. Ultimately, the synthesis of outcomes from the text-swapping network and the background inpainting network is achieved through a fusion network, culminating in the creation of the meticulously edited final image. A demo video is included in the supplementary material.
- What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE/CVF international conference on computer vision, 4715–4723.
- Github. 2020. GitHub - youdao-ai/SRNet-Datagen: This is a data generator of SRNet which is the model of paper Editing Text in the wild. — github.com. https://github.com/youdao-ai/SRNet-Datagen.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401–4410.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Generating synthetic data for text recognition. arXiv preprint arXiv:1608.04224.
- Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3920–3928.
- Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4990–4998.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), 565–571. Ieee.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Exploring stroke-level modifications for scene text editing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2119–2127.
- STEFANN: scene text editor using font adaptive neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13228–13237.
- Stroke-based scene text erasing using synthetic data for training. IEEE Transactions on Image Processing, 30: 9306–9320.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
- Editing text in the wild. In Proceedings of the 27th ACM international conference on multimedia, 1500–1508.
- Synthtiger: Synthetic text image generator towards better text recognition models. In International Conference on Document Analysis and Recognition, 109–124. Springer.
- Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2881–2890.
- Felix Liawi (1 paper)
- Yun-Da Tsai (17 papers)
- Guan-Lun Lu (1 paper)
- Shou-De Lin (29 papers)