Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation (2312.07925v1)
Abstract: Document dewarping, aiming to eliminate geometric deformation in photographed documents to benefit text recognition, has made great progress in recent years but is still far from being solved. While Cartesian coordinates are typically leveraged by state-of-the-art approaches to learn a group of deformation control points, such representation is not efficient for dewarping model to learn the deformation information. In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc. In contrast to most current works adopting a two-stage pipeline typically, Polar representation enables a unified point regression framework for both segmentation and dewarping network in one single stage. Such unification makes the whole model more efficient to learn under an end-to-end optimization pipeline, and also obtains a compact representation. Furthermore, we propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization under the Polar representation. Visual comparisons and quantitative experiments on two benchmarks show that, with much fewer parameters than the other mainstream counterparts, our one-stage model with multi-scope constraints achieves new state-of-the-art performance on both pixel alignment metrics and OCR metrics. Source codes will be available at \url{*****}.
- Bookstein, F. L. 1989. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 11(6): 567–585.
- Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In International Conference on Computer Vision(ICCV), volume 2, 367–374.
- Rectifying the bound document image captured by the camera: A model based approach. In International Conference on Document Analysis and Recognition(ICDAR), 71–75. IEEE.
- Darnet: Deep active ray network for building segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 7431–7439.
- Document ai: Benchmarks, models and applications. arXiv preprint arXiv:2111.08609.
- DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. In International Conference on Computer Vision(ICCV), 131–140.
- End-to-end piece-wise unwarping of document images. In International Conference on Computer Vision(ICCV), 4268–4277.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations(ICLR).
- Deep Unrestricted Document Image Rectification. arXiv.
- DocTr: Document image transformer for geometric unwarping and illumination correction. In Proceedings of the ACM International Conference on Multimedia(MM), 273–281.
- Geometric Representation Learning for Document Image Rectification. In European Conference on Computer Vision(ECCV).
- Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 580–587.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 770–778.
- Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping. International Journal on Document Analysis and Recognition(IJDAR), 1–12.
- Revisiting Document Image Dewarping by Grid Regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 4533–4542.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations(ICLR).
- Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing(TIP), 18(7): 1551–1562.
- Document rectification and illumination correction using a patch-based CNN. ACM Transactions on Graphics(TOG), 38(6): 1–11.
- Geometric rectification of camera-captured document images. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 30(4): 591–605.
- Focal loss for dense object detection. In International Conference on Computer Vision(ICCV), 2980–2988.
- Parsing table structures in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 944–952.
- GeoLayoutLM: Geometric Pre-training for Visual Information Extraction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).
- Learning From Documents in the Wild to Improve Document Unwarping. In ACM Special Interest Group on Computer Graphics(SIGGRAPH), 1–9.
- DocUNet: Document image unwarping via a stacked U-Net. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 4700–4709.
- Docvqa: A dataset for vqa on document images. In IEEE/CVF Winter Conference on Applications of Computer vision(WACV), 2200–2209.
- Exploiting vector fields for geometric rectification of distorted document images. In European Conference on Computer Vision(ECCV), 172–187.
- Active flattening of curved document images via two structured beams. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 3890–3897.
- Generalized intersection over union: A metric and a loss for bounding box regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 658–666.
- Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention(MICCAI), 265–273. Springer.
- Rectification and 3D reconstruction of curved document images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 377–384. IEEE.
- Neural Document Unwarping using Coupled Grids. arXiv preprint arXiv:2302.02887.
- Polarmask: Single shot instance segmentation with polar representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 12193–12202.
- Polarmask++: Enhanced polar representation for single-shot instance segmentation and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI).
- Dewarping document image by displacement flow estimation with fully convolutional Network. In International Workshop on Document Analysis Systems(DAS), 131–144.
- Document Dewarping with Control Points. In International Conference on Document Analysis and Recognition(ICDAR), 466–480.
- Explicit shape encoding for real-time instance segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 5168–5177.
- Fourier document restoration for robust document dewarping and recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 4573–4582.
- Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In International Conference on Pattern Recognition(ICPR), volume 1, 482–485. IEEE.
- Modeling Entities as Semantic Points for Visual Information Extraction in the Wild. arXiv preprint arXiv:2303.13095.
- Multiview rectification of folded documents. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 40(2): 505–511.
- Unitbox: An advanced object detection network. In Proceedings of the ACM International Conference on Multimedia(MM), 516–520.
- Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild. In Proceedings of the ACM International Conference on Multimedia(MM), 2805–2815.
- A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition(PR), 42(11): 2961–2978.
- An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 30(4): 728–734.
- Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing, 506: 146–157.
- Distance-IoU loss: Faster and better learning for bounding box regression. In AAAI Conference on Artificial Intelligence(AAAI), volume 34, 12993–13000.