Msmsfnet: a multi-stream and multi-scale fusion net for edge detection (2404.04856v2)
Abstract: Edge detection is a long-standing problem in computer vision. Recent deep learning based algorithms achieve state-of-the-art performance in publicly available datasets. Despite their efficiency, their performance, however, relies heavily on the pre-trained weights of the backbone network on the ImageNet dataset. This significantly limits the design space of deep learning based edge detectors. Whenever we want to devise a new model, we have to train this new model on the ImageNet dataset first, and then fine-tune the model using the edge detection datasets. The comparison would be unfair otherwise. However, it is usually not feasible for many researchers to train a model on the ImageNet dataset due to the limited computation resources. Besides, if these methods need to be trained to detect edges in a different kind of data, Synthetic Aperture Radar (SAR) images for instance, the pre-trained weights on the ImageNet dataset are unlikely to improve the edge detection accuracy due to the strong differences in the statistics between optical and SAR images. In the meantime, no dataset for SAR image processing matches the size of the ImageNet dataset. In this work, we study the performance achievable by existing methods in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi-scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch to ensure the fairness of comparison, our model outperforms state-of-the-art deep learning based edge detectors in three publicly available datasets. The efficiency of our model is further demonstrated by the experiments for edge detection in SAR images, which serves as an important evidence showing the meaningfulness of this work as no useful pre-trained weight is available for edge detection in SAR images.
- J. Kittler, “On the accuracy of the sobel edge detector,” Image and Vision Computing, vol. 1, pp. 37–42, 1983.
- J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, pp. 679–698, 1986.
- S. Konishi, A. L. Yuille, J. M. Coughlan, and S. Zhu, “Statistical edge detection: Learning and evaluating edge cues,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 57–74, 2003.
- D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural image boundaries using local brightness, color, and texture cues,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 530–549, 2004.
- P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 898–916, 2011.
- P. Dollar and C. L. Zitnick, “Fast edge detection using structured forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, pp. 1558–1570, 2015.
- R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometrical edge detection for sar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 26, pp. 764–773, 1988.
- R. Fjørtoft, A. Lopes, P. Marthon, and E. Cubero-Castan, “An optimal multiedge detector for sar image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, pp. 793–802, 1998.
- C. Liu, F. Tupin, and Y. Gousseau, “Training cnns on speckled optical dataset for edge detection in sar images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 170, pp. 88–102, 2020.
- C. Liu, R. Abergel, Y. Gousseau, and F. Tupin, “Lsdsar, a markovian a contrario framework for line segment detection in sar images,” Pattern Recognition, vol. 98, 2020.
- C. Wang, L. Cheng, X.-S. Wang, B. Zhang, and A. Stein, “Interferometric synthetic aperture radar statistical inference in deformation measurement and geophysical inversion: A review,” IEEE Geoscience and Remote Sensing Magazine, 2024.
- R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in pictures,” Communications of the ACM, vol. 15, pp. 11–15, 1972.
- J. Skingley and A. J. Rye, “The hough transform applied to sar images for thin line detection,” Pattern Recognition Letters, vol. 6, pp. 61–67, 1987.
- C. Liu, C. Liu, C. Wang, W. Zhu, and Q. Li, “A novel pixel orientation estimation based line segment detection framework, and its applications to sar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–19, 2022.
- C. Rother, V. Kolmogorov, and A. Blake, “"grabcut": interactive foreground extraction using iterated graph cuts,” ACM Trans. Graph., vol. 23, no. 3, p. 309–314, 2004.
- J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, “Epicflow: Edge-preserving interpolation of correspondences for optical flow,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1164–1172.
- V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid, “Groups of adjacent contour segments for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 36–51, 2008.
- J. J. Lim, C. L. Zitnick, and P. Dollár, “Sketch tokens: A learned mid-level representation for contour and object detection,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3158–3165.
- G. Bertasius, J. Shi, and L. Torresani, “Deepedge: A multi-scale bifurcated deep network for top-down contour detection,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389.
- W. Shen, X. Wang, Y. Wang, X. Bai, and Z. Zhang, “Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3982–3991.
- S. Xie and Z. Tu, “Holistically-nested edge detection,” International Journal of Computer Vision, vol. 125, no. 1-3, pp. 3–18, 2017.
- J. Yang, B. Price, S. Cohen, H. Lee, and M.-H. Yang, “Object contour detection with a fully convolutional encoder-decoder network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 193–202.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015. arXiv:1409.1556, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. V. Gool, “Convolutional oriented boundaries: From image segmentation to high-level tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 819–833, 2017.
- Y. Liu, M.-M. Cheng, X. Hu, J.-W. Bian, L. Zhang, X. Bai, and J. Tang, “Richer convolutional features for edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1939–1946, 2019.
- Y. Wang, X. Zhao, Y. Li, and K. Huang, “Deep crisp boundaries: From boundaries to higher-level tasks,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1285–1298, 2019.
- I. Kokkinos, “Pushing the boundaries of boundary detection using deep learning,” in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, vol. 4, 2016.
- J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
- A. P. Kelm, V. S. Rao, and U. Zoelzer, “Object contour and edge detection with refinecontournet,” in Computer Analysis of Images and Patterns, 2019, pp. 246–258.
- G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5168–5177.
- J. He, S. Zhang, M. Yang, Y. Shan, and T. Huang, “Bdcn: Bi-directional cascade network for perceptual edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 100–113, 2022.
- L. Huan, N. Xue, X. Zheng, W. He, J. Gong, and G.-S. Xia, “Unmixing convolutional features for crisp edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6602–6609, 2022.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- X. Soria, A. Sappa, P. Humanante, and A. Akbarinia, “Dense extreme inception network for edge detection,” Pattern Recognition, vol. 139, p. 109461, 2023.
- D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its applications to evaluating segmentation algorithms and and measuring ecological statistics,” in IEEE International Conference on Computer Vision, vol. 2, 2001, pp. 416–423.
- P. K. Nathan Silberman, Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Neural Information Processing Systems (NIPS), 2012, pp. 1106–1114.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, vol. 37, 2015, pp. 448–456.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, vol. 31, 2017, pp. 4278––4284.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1492–1500.
- S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2net: A new multi-scale backbone architecture,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, 2021.
- C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, vol. 38, 2015, pp. 562–570.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
- R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille, “The role of context for object detection and semantic segmentation in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014.