TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images (2411.00355v2)
Abstract: In this paper, we propose TextDestroyer, the first training- and annotation-free method for scene text destruction using a pre-trained diffusion model. Existing scene text removal models require complex annotation and retraining, and may leave faint yet recognizable text information, compromising privacy protection and content concealment. TextDestroyer addresses these issues by employing a three-stage hierarchical process to obtain accurate text masks. Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction. During the diffusion denoising process, self-attention key and value are referenced from the original latent to restore the compromised background. Latent codes saved at each inversion step are used for replacement during reconstruction, ensuring perfect background restoration. The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
- Deeperaser: Deep iterative context mining for generic text eraser. arXiv preprint arXiv:2402.19108, 2024.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 2021.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022.
- ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
- Deepfloyd.
- Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 2020.
- Character-aware models improve visual text rendering. In Annual Meeting of the Association for Computational Linguistics, 2023.
- Byt5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 2022.
- Textdiffuser: Diffusion models as text painters. Advances in Neural Information Processing Systems, 2024.
- Glyphdraw: Learning to draw chinese characters in image synthesis models coherently. arXiv preprint arXiv:2303.17870, 2023.
- Glyphcontrol: Glyph conditional control for visual text generation. Advances in Neural Information Processing Systems, 2024.
- Brush your text: Synthesize any scene text on images via diffusion model. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
- Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- An inpainting system for automatic image structure-texture restoration with text removal. In IEEE International Conference on Image Processing, 2008.
- Text localization, extraction and inpainting in color images. In 20th Iranian Conference on Electrical Engineering, 2012.
- Image inpainting-automatic detection and removal of text from images. International Journal of Engineering Research and Applications, 2014.
- Priyanka Deelip Wagh and DR Patil. Text detection and removal from image using inpainting with smoothing. In International Conference on Pervasive Computing, 2015.
- Scene text eraser. In IAPR International Conference on Document Analysis and Recognition, 2017.
- Ensnet: Ensconce text in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Mtrnet: A generic scene text eraser. In International Conference on Document Analysis and Recognition, 2019.
- Pert: A progressively region-based network for scene text removal. arXiv preprint arXiv:2106.13029, 2021.
- Strdd: Scene text removal with diffusion probabilistic models. In International Symposium on Artificial Intelligence and Robotics, 2022.
- Don’t forget me: Accurate background recovery for text removal via modeling local-global context. In Proceedings of the European conference on computer vision, 2022.
- Scene text removal via cascaded text stroke detection and erasing. Computational Visual Media, 2022.
- Erasenet: End-to-end text removal in the wild. IEEE Transactions on Image Processing, 2020.
- The surprisingly straightforward scene text removal method with gated attention and region of interest generation: A comprehensive prominent model analysis. In Proceedings of the European Conference on Computer Vision, 2022.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, 2022.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
- Prompt-to-prompt image editing with cross-attention control. In International Conference on Learning Representations, 2022.
- Patrick Esser Robin Rombach. Stable diffusion v1-5 model card.
- Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.