Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection (2401.02032v2)

Published 4 Jan 2024 in cs.CV

Abstract: Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: https://github.com/GuHuangAI/DiffusionEdge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5): 898–916.
  2. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34: 17981–17993.
  3. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18208–18218.
  4. High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In Proceedings of the IEEE international conference on computer vision, 504–512.
  5. Denoising pretraining for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4175–4186.
  6. Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6): 679–698.
  7. Diffusiondet: Diffusion model for object detection. arXiv preprint arXiv:2211.09788.
  8. Boundary-preserving mask r-cnn. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 660–676. Springer.
  9. Learning to predict crisp boundaries. In Proceedings of the European conference on computer vision (ECCV), 562–578.
  10. Fast edge detection using structured forests. IEEE transactions on pattern analysis and machine intelligence, 37(8): 1558–1570.
  11. -fields: neural network nearest neighbor fields for image transforms. In Asian conference on computer vision, 536–551. Springer.
  12. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10696–10706.
  13. Oriented edge forests for boundary detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1732–1740.
  14. Bi-directional cascade network for perceptual edge detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3828–3837.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840–6851.
  17. Unmixing convolutional features for crisp edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 6602–6609.
  18. Decoupled Diffusion Models with Explicit Transition Probability. arXiv preprint arXiv:2306.13720.
  19. Kittler, J. 1983. On the accuracy of the Sobel edge detector. Image and Vision Computing, 1(1): 37–42.
  20. Kokkinos, I. 2015. Pushing the boundaries of boundary detection using deep learning. arXiv preprint arXiv:1511.07386.
  21. Deep-learning-based object-level contour detection with CCG and CRF optimization. In 2017 IEEE International Conference on Multimedia and Expo (ICME), 859–864. IEEE.
  22. Richer convolutional features for edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3000–3009.
  23. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022.
  24. Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE transactions on pattern analysis and machine intelligence, 40(4): 819–833.
  25. A systematic comparison between visual cues for boundary detection. Vision research, 120: 93–107.
  26. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212.
  27. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741.
  28. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  29. Dense extreme inception network: Towards a robust cnn model for edge detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 1923–1932.
  30. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, 8599–8608. PMLR.
  31. Edter: Edge detection with transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1402–1412.
  32. Epicflow: Edge-preserving interpolation of correspondences for optical flow. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1164–1172.
  33. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695.
  34. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3982–3991.
  35. Indoor segmentation and support inference from rgbd images. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, 746–760. Springer.
  36. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, 2256–2265. PMLR.
  37. Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF international conference on computer vision, 5117–5127.
  38. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, 6105–6114. PMLR.
  39. Deep crisp boundaries: From boundaries to higher-level tasks. IEEE Transactions on Image Processing, 28(3): 1285–1298.
  40. MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model. In Medical Imaging with Deep Learning.
  41. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, 1395–1403.
  42. Foreground-aware image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5840–5848.
  43. Learning deep structured multi-scale features using attention-gated crfs for contour prediction. Advances in neural information processing systems, 30.
  44. Object contour detection with a fully convolutional encoder-decoder network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 193–202.
  45. Delving into Crispness: Guided Label Refinement for Crisp Edge Detection. IEEE Transactions on Image Processing.
  46. NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8486–8495.
  47. The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15507–15517.
  48. Edge boxes: Locating object proposals from edges. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 391–405. Springer.
Citations (16)

Summary

  • The paper presents DiffusionEdge, a diffusion probabilistic model that accurately detects crisp edges by denoising at the original image scale.
  • It introduces key innovations including a decoupled architecture, adaptive Fourier filtering, and uncertainty distillation to enhance edge accuracy and efficiency.
  • Experimental results on benchmarks like NYUDv2 show significant improvements in F-scores and Average Crispness, reducing heavy reliance on post-processing techniques.

Analysis of "DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection"

The research paper entitled "DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection" presents an innovative approach to addressing the challenges inherent in edge detection, a critical computer vision task. The authors propose a novel method that leverages the potential of Diffusion Probabilistic Models (DPMs) to achieve both correct and crisp edge maps more efficiently than prevailing techniques.

Key Contributions

The core contribution of the paper is the introduction of DiffusionEdge, the first instance of employing a diffusion model for edge detection tasks. Traditional edge detection systems, including both classical methods like Canny and modern CNN-based techniques, often struggle with balancing edge correctness with edge crispness. The proposed DiffusionEdge solves this problem by harnessing the strengths of DPMs, notably their ability to perform denoising operations directly on the original image scale.

Furthermore, the paper introduces several pertinent technical innovations to enhance the capabilities of DiffusionEdge:

  1. Decoupled Architecture: By incorporating a decoupled diffusion architecture, similar to that used in DDM, the method accelerates inference processes significantly.
  2. Adaptive Fourier Filter: The introduction of an adaptive frequency filtering mechanism allows the adjustment of specific frequency components, thereby refining latent features crucial for achieving crisp edge detection.
  3. Uncertainty Distillation: This approach retains essential uncertainty information inherent in datasets labeled by multiple annotators. The method optimizes parameters directly in the latent space using a cross-entropy loss, an approach that reduces both computational demands and the need for extensive data augmentation.

The efficacy of DiffusionEdge is demonstrated through extensive experiments across four edge detection benchmarks: BSDS, NYUDv2, Multicue, and BIPED. The results indicate that DiffusionEdge outperforms contemporary state-of-the-art approaches in terms of F-scores and Average Crispness (AC), particularly on the NYUDv2 dataset with significant improvements of 30.2% in ODS, 28.1% in OIS, and 65.1% in AC compared to the second-best method.

Implications and Future Directions

The development of DiffusionEdge presents substantial implications for both theoretical understanding and practical applications in edge detection. By integrating a diffusion model with a novel adaptive filtering technique and uncertainty-aware optimization, this work sets a precedent for future explorations into using generative models for edge detection tasks. This is particularly relevant in scenarios where edge crispness—indispensable for tasks such as 2D perception, image generation, and 3D reconstruction—is of high importance.

Practically, the reduction in reliance on computationally expensive post-processing techniques and extensive dataset augmentations marks a significant stride towards more efficient edge detection frameworks. This method's success also hints at the potential application of diffusion models in other areas of computer vision beyond generative tasks, such as object segmentation and recognition.

However, as noted by the authors, the efficiency of the diffusion model in terms of inference speed still warrants further investigation. Future research should focus on optimization strategies that maintain accuracy and crispness while minimizing computational overhead.

In summary, the "DiffusionEdge" model illustrates a meaningful step forward in edge detection research, prompting further discourse on the integration of diffusion probabilistic models in vision tasks and sparking ideas for optimizing computational performances in these systems.