Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Explaining generative diffusion models via visual analysis for interpretable decision-making process (2402.10404v1)

Published 16 Feb 2024 in cs.CV and cs.AI

Abstract: Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to interpret. To address this issue, we propose the three research questions to interpret the diffusion process from the perspective of the visual concepts generated by the model and the region where the model attends in each time step. We devise tools for visualizing the diffusion process and answering the aforementioned research questions to render the diffusion process human-understandable. We show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. We substantiate our tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Our findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Combining cnn and grad-cam for profitability and explainability of investment strategy: Application to the kospi 200 futures. In Expert Systems with Applications, page 120086, 2023.
  2. Data-driven multi-step prediction and analysis of monthly rainfall using explainable deep learning. In Expert Systems with Applications, page 121160, 2024.
  3. Neural additive time-series models: Explainable deep learning for multivariate time-series prediction. In Expert Systems with Applications, page 120307, 2023.
  4. History of art paintings through the lens of entropy and complexity. In Proceedings of the National Academy of Sciences, 2018.
  5. Matjaž Perc. Beauty in artistic expressions through the eyes of networks and physics. In Journal of the Royal Society Interface, 2020.
  6. Fault diagnosis using explainable ai: A transfer learning-based approach for rotating machinery exploiting augmented synthetic data. In Expert Systems with Applications, page 120860, 2023.
  7. Analysis of neural networks trained with evolutionary algorithms for the classification of breast cancer histological images. In Expert Systems with Applications, page 120609, 2023.
  8. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851, 2020.
  9. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  10. A hierarchical multivariate denoising diffusion model. In Information Sciences, page 119623. Elsevier, 2023.
  11. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  12. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  13. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
  14. Variational diffusion models. In Advances in Neural Information Processing Systems, pages 21696–21707, 2021.
  15. Controlled graph neural networks with denoising diffusion for anomaly detection. In Expert Systems with Applications, page 121533, 2023.
  16. Fusiondiff: Multi-focus image fusion using denoising diffusion probabilistic models. In Expert Systems with Applications, page 121664, 2023.
  17. Perception prioritized training of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11472–11481, 2022.
  18. Diffusion models already have a semantic latent space. In The International Conference on Learning Representations, 2023.
  19. Biases in generative art: A causal look from the lens of art history. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 41–51, 2021.
  20. Stable bias: Analyzing societal representations in diffusion models. In arXiv preprint arXiv:2303.11408, 2023.
  21. Explainability of artificial intelligence methods, applications and challenges: A comprehensive survey. In Information Sciences, pages 238–292. Elsevier, 2022.
  22. Rise: Randomized input sampling for explanation of black-box models. In British Machine Vision Conference, page 151, 2018.
  23. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 618–626, 2017.
  24. Unleashing transformers: parallel token prediction with discrete absorbing diffusion for fast high-resolution image generation from vector-quantized codes. In Proceedings of the IEEE European Conference on Computer Vision, pages 170–188. Springer, 2022.
  25. High fidelity visualization of what your self-supervised representation knows about. In Transactions on Machine Learning Research, pages 2835–8856, 2022.
  26. Vaes meet diffusion models: Efficient and high-fidelity generation. In Neural Information Processing Systems Workshop on Deep Generative Models and Downstream Applications, 2021.
  27. Cascaded diffusion models for high fidelity image generation. In Journal of Machine Learning Research, pages 1–33, 2022.
  28. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  29. Xai—explainable artificial intelligence. In Science Robotics, page 7120. American Association for the Advancement of Science, 2019.
  30. Analytical interpretation of the gap of cnn’s cognition between sar and optical target recognition. In Neural Networks, pages 982–986. Elsevier, 2023.
  31. An explainable autoencoder with multi-paradigm fmri fusion for identifying differences in dynamic functional connectivity during brain development. In Neural Networks, pages 185–197. Elsevier, 2023.
  32. Vs-cam: Vertex semantic class activation mapping to interpret vision graph neural network. In Neurocomputing, pages 104–115. Elsevier, 2023.
  33. Visualizing deep networks using segmentation recognition and interpretation algorithm. In Information Sciences, pages 1381–1396. Elsevier, 2022.
  34. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. In PloS One, page e0130140. Public Library of Science San Francisco, CA USA, 2015.
  35. Explaining nonlinear classification decisions with deep taylor decomposition. In Pattern Recognition, pages 211–222. Elsevier, 2017.
  36. Black-box explanation of object detectors via saliency maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11443–11452, 2021.
  37. Complete face recovery gan: Unsupervised joint face rotation and de-occlusion from a single-view image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3711–3721, 2022.
  38. Image quality assessment: from error visibility to structural similarity. In IEEE Transactions on Image Processing, pages 600–612. IEEE, 2004.
  39. Loss functions for image restoration with neural networks. In IEEE Transactions on Computational Imaging, pages 47–57. IEEE, 2016.
  40. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  41. Laion-5b: An open large-scale dataset for training next generation image-text models. In Neural Information Processing Systems, 2022.
  42. Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3128–3137, 2015.
  43. Visual genome: Connecting language and vision using crowdsourced dense image annotations. In International journal of computer vision, pages 32–73. Springer, 2017.
  44. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 397–406, 2021.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.