RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization (2404.09530v2)
Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.
- Information extraction from PDF sources based on rule-based system using integrated formats. In Semantic Web Challenges: Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29-June 2, 2016, Revised Selected Papers 3. Springer, 293–308.
- End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
- Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2988–2997.
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.
- ultralytics/yolov5: v6. 0-YOLOv5n’Nano’models, Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo (2021).
- A robust learning approach to domain adaptive object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 480–490.
- Unsupervised Domain Adaptation for Document Layout Detection. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR).
- DocBank: A benchmark dataset for document layout analysis. arXiv preprint arXiv:2006.01038 (2020).
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
- DocLayNet: A Large-scale Dataset for Document Layout Analysis. In 2020 ACM Multimedia Conference on Multimedia Conference. ACM, 2713–2721.
- IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents. arXiv:2008.02569 [cs.CV]
- Xi Peng and Kate Saenko. 2015. Domain adaptation for visual applications: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2015), 147–163.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
- Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.
- Detectron2. (2019).
- DocUNet: Document Image Unwarping via A Stacked U-Net. IEEE Transactions on Image Processing (2022).
- How does the combined risk affect the performance of unsupervised domain adaptation approaches?. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11079–11087.
- PubLayNet: largest dataset ever for document layout analysis. In 2019 25th International Conference on Pattern Recognition (ICPR). IEEE, 1046–1051.
- Avinash Anand (19 papers)
- Raj Jaiswal (6 papers)
- Mohit Gupta (47 papers)
- Siddhesh S Bangar (1 paper)
- Pijush Bhuyan (2 papers)
- Naman Lal (7 papers)
- Rajeev Singh (86 papers)
- Ritika Jha (2 papers)
- Rajiv Ratn Shah (108 papers)
- Shin'ichi Satoh (52 papers)