Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 225 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Watermark Text Pattern Spotting in Document Images (2401.05167v2)

Published 10 Jan 2024 in cs.CV

Abstract: Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Layoutlmv2: Multi-modal pre-training for visually-rich document understanding, arXiv preprint arXiv:2012.14740 (2020).
  2. Selfdoc: Self-supervised document representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5652–5660.
  3. Docformer: End-to-end transformer for document understanding, arXiv preprint arXiv:2106.11539 (2021).
  4. Formnet: Structural encoding beyond sequential modeling in form document information extraction, arXiv preprint arXiv:2203.08411 (2022).
  5. Layoutlmv3: Pre-training for document ai with unified text and image masking, arXiv preprint arXiv:2204.08387 (2022).
  6. Consent: Context sensitive transformer for bold words classification, arXiv preprint arXiv:2205.07683 (2022).
  7. Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection, in: AAAI, 2020.
  8. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task, in: Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 5505–5514.
  9. Towards explainable ai: Assessing the usefulness and impact of added explainability features in legal document summarization, in: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–7.
  10. Analysis of graphsum’s attention weights to improve the explainability of multi-document summarization, in: The 23rd International Conference on Information Integration and Web Intelligence, 2021, pp. 359–366.
  11. Bertgcn: Transductive text classification by combining gcn and bert, arXiv preprint arXiv:2105.05727 (2021).
  12. AWS-Textract, Textract-aws ocr engine, https://aws.amazon.com/textract (2019).
  13. Glass: Global to local attention for scene-text spotting, in: ECCV, Springer, 2022, pp. 249–266.
  14. Fots: Fast oriented text spotting with a unified network, in: CVPR, 2018, pp. 5676–5685.
  15. Aster: An attentional scene text recognizer with flexible rectification, PAMI 41 (2018) 2035–2048.
  16. Char-net: A character-aware neural network for distorted scene text recognition, in: AAAI, volume 32, 2018.
  17. Lal: Linguistically aware learning for scene text recognition, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4051–4059.
  18. Character region attention for text spotting, in: ECCV, Springer, 2020, pp. 504–521.
  19. Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting, arXiv preprint arXiv:2105.03620 (2021).
  20. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition, in: CVPR, 2021, pp. 7098–7107.
  21. Robustscanner: Dynamically enhancing positional clues for robust text recognition, in: ECCV, Springer, 2020, pp. 135–151.
  22. Deep features for text spotting, in: ECCV, Springer, 2014, pp. 512–528.
  23. Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299.
  24. Document understanding dataset and evaluation (dude), in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19528–19540.
  25. Structural similarity for document image classification and retrieval, Pattern Recognition Letters 43 (2014) 119–126.
  26. J.-P. T. Guillaume Jaume, Hazim Kemal Ekenel, Funsd: A dataset for form understanding in noisy scanned documents, in: ICDAR-OST, 2019.
  27. Unstructured object matching using co-salient region segmentation, in: CVPR Workshops, 2022, pp. 5051–5060.
  28. Large sequence representation learning via multi-stage latent transformers, in: Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 4633–4639.
  29. Faster r-cnn: Towards real-time object detection with region proposal networks, NIPS 28 (2015) 91–99.
  30. You only look once: Unified, real-time object detection, in: CVPR, 2016, pp. 779–788.
  31. Efficientdet: Scalable and efficient object detection, in: CVPR, 2020, pp. 10781–10790.
  32. Focal loss for dense object detection, in: ICCV, 2017.
  33. End-to-end object detection with transformers, in: ECCV, Springer, 2020, pp. 213–229.
  34. Exploring plain vision transformer backbones for object detection, in: ECCV, Springer, 2022, pp. 280–296.
  35. k-nn embeded space conditioning for enhanced few-shot object detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 401–410.
  36. Kleister: key information extraction datasets involving long documents with complex layouts, in: ICDAR, Springer, 2021, pp. 564–579.
  37. K. Lang, T. Mitchell, Newsgroup 20 dataset, 1999.
  38. Textsnake: A flexible representation for detecting text of arbitrary shapes, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20–36.
  39. Real-time scene text detection with differentiable binarization and adaptive scale fusion, PAMI (2022).
  40. Towards unified scene text spotting based on sequence generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15223–15232.
  41. Turning a clip model into a scene text detector, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6978–6988.
  42. Deep residual learning for image recognition, in: NIPS, 2017.
  43. Swin transformer: Hierarchical vision transformer using shifted windows, in: ICCV, 2021, pp. 10012–10022.
  44. Attention is all you need, in: NIPS, 2017.
  45. Master: Multi-aspect non-local network for scene text recognition, Pattern Recognition 117 (2021) 107980.
  46. On recognizing texts of arbitrary shapes with 2d self-attention, in: CVPR Workshops, 2020, pp. 546–547.
  47. D. Bautista, R. Atienza, Scene text recognition with permuted autoregressive sequence models, in: European Conference on Computer Vision, Springer Nature Switzerland, Cham, 2022, pp. 178–196. URL: https://doi.org/10.1007/978-3-031-19815-1_11. doi:10.1007/978-3-031-19815-1_11.
  48. Microsoft coco: Common objects in context, in: ECCV, 2014.
  49. I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: ICLR, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.