Adaptive Multi Scale Document Binarisation Using Vision Mamba (2410.22811v1)
Abstract: Enhancing and preserving the readability of document images, particularly historical ones, is crucial for effective document image analysis. Numerous models have been proposed for this task, including convolutional-based, transformer-based, and hybrid convolutional-transformer architectures. While hybrid models address the limitations of purely convolutional or transformer-based methods, they often suffer from issues like quadratic time complexity. In this work, we propose a Mamba-based architecture for document binarisation, which efficiently handles long sequences by scaling linearly and optimizing memory usage. Additionally, we introduce novel modifications to the skip connections by incorporating Difference of Gaussians (DoG) features, inspired by conventional signal processing techniques. These multiscale high-frequency features enable the model to produce high-quality, detailed outputs.
- Document Image Binarization Using LSTM: A Sequence Learning Approach. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing. 79–84.
- U-Net-bin: Hacking the Document Image Binarization Contest. Pattern Recognition and Image Analysis 43, 5 (2019), 825–832.
- TransDocUNet: A Transformer-Based UNet Architecture for Degraded Document Image Binarization. In Proceedings of the Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP ’23). Association for Computing Machinery, 1–9.
- Derek Bradley and Gerhard Roth. 2007. Adaptive Thresholding Using the Integral Image. Journal of Graphics Tools 12, 2 (2007), 13–21.
- ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). In 2009 10th International Conference on Document Analysis and Recognition. IEEE, 1375–1382.
- Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752 (2023).
- Nicholas R Howe. 2011. A Laplacian Energy for Document Binarization. In 2011 International Conference on Document Analysis and Recognition. IEEE, 6–10.
- Swin-UMamba: Mamba-Based UNet with ImageNet-Based Pretraining. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer Nature Switzerland, 615–625.
- Vmamba: Visual State Space Model. arXiv preprint arXiv:2401.10166 (2024).
- ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014). In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 809–813.
- Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62–66.
- H-DIBCO 2010 - Handwritten Document Image Binarization Competition. In 2010 12th International Conference on Frontiers in Handwriting Recognition. IEEE, 727–732.
- ICDAR 2011 Document Image Binarization Contest (DIBCO 2011). In 2011 International Conference on Document Analysis and Recognition. IEEE, 1506–1510.
- ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012). In 2012 International Conference on Frontiers in Handwriting Recognition. IEEE, 817–822.
- ICDAR 2013 Document Image Binarization Contest (DIBCO 2013). In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1471–1476.
- ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016). In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 619–623.
- Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive Document Image Binarization. Pattern Recognition 33, 2 (2000), 225–236.
- Chris Tensmeyer and Tony Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 99–104.