Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeuralDiffuser: Controllable fMRI Reconstruction with Primary Visual Feature Guided Diffusion (2402.13809v2)

Published 21 Feb 2024 in cs.NE, cs.AI, and cs.CV

Abstract: Reconstructing visual stimuli from functional Magnetic Resonance Imaging (fMRI) based on Latent Diffusion Models (LDM) provides a fine-grained retrieval of the brain. A challenge persists in reconstructing a cohesive alignment of details (such as structure, background, texture, color, etc.). Moreover, LDMs would generate different image results even under the same conditions. For these, we first uncover the neuroscientific perspective of LDM-based methods that is top-down creation based on pre-trained knowledge from massive images but lack of detail-driven bottom-up perception resulting in unfaithful details. We propose NeuralDiffuser which introduces primary visual feature guidance to provide detail cues in the form of gradients, extending the bottom-up process for LDM-based methods to achieve faithful semantics and details. We also developed a novel guidance strategy to ensure the consistency of repeated reconstructions rather than a variety of results. We obtain the state-of-the-art performance of NeuralDiffuser on the Natural Senses Dataset (NSD), which offers more faithful details and consistent results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116–126, 2022.
  2. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  843–852, 2023.
  3. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri. Advances in Neural Information Processing Systems, 32, 2019.
  4. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  5. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22710–22720, 2023.
  6. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  7. Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602, 2011.
  8. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991), 1(1):1–47, 1991.
  9. Friston, K. A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological sciences, 360(1456):815–836, 2005.
  10. Brain states: top-down influences in sensory processing. Neuron, 54(5):677–696, 2007.
  11. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  12. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27):10005–10014, 2015.
  13. Low-level tuning biases in higher visual cortex reflect the semantic informativeness of visual features. Journal of Vision, 23(4):8–8, 2023.
  14. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  15. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications, 8(1):15037, 2017.
  16. Humansd: A native skeleton-guided diffusion model for human image generation. arXiv preprint arXiv:2304.04269, 2023.
  17. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020a.
  18. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8110–8119, 2020b.
  19. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, 116(43):21854–21863, 2019.
  20. Noise2score: tweedie’s approach to self-supervised image denoising without clean images. Advances in Neural Information Processing Systems, 34:864–874, 2021.
  21. Mixco: Mix-up contrastive learning for visual representation. arXiv preprint arXiv:2010.06300, 2020.
  22. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  23. Brain-optimized inference improves reconstructions of fmri brain activity. arXiv preprint arXiv:2312.07705, 2023.
  24. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  25. Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems, 35:29624–29636, 2022.
  26. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
  27. Dcnn-gan: Reconstructing realistic image from fmri. In 2019 16th International Conference on Machine Vision Applications (MVA), pp.  1–6. IEEE, 2019.
  28. Minddiffuser: Controlled image reconstruction from human brain activity with semantic and structural diffusion. In Proceedings of the 31st ACM International Conference on Multimedia, pp.  5899–5908, 2023.
  29. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
  30. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334, 2023.
  31. Reconstruction of perceived images from fmri patterns and semantic brain exploration using instance-conditioned gans. In 2022 International Joint Conference on Neural Networks (IJCNN), pp.  1–8. IEEE, 2022.
  32. Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355, 2016.
  33. Pollen, D. A. On the neural correlates of visual perception. Cerebral cortex, 9(1):4–19, 1999.
  34. Binless kernel machine: Modeling spike train transformation for cognitive neural prostheses. Neural Computation, 32(10):1863–1900, 2020.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  36. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  37. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning. NeuroImage, 228:117602, 2021.
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  39. Functional imaging of visual semantic processing in the human brain. Cortex, 36(4):579–591, 2000.
  40. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  41. Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors. arXiv preprint arXiv:2305.18274, 2023.
  42. Generative adversarial networks for reconstructing natural images from brain activity. NeuroImage, 181:775–785, 2018.
  43. Seitzer, M. pytorch-fid: FID Score for PyTorch. https://github.com/mseitzer/pytorch-fid, August 2020. Version 0.3.0.
  44. End-to-end deep image reconstruction from human brain activity. Frontiers in computational neuroscience, 13:21, 2019.
  45. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
  46. Generative adversarial networks conditioned on brain activity reconstruct seen images. In 2018 IEEE international conference on systems, man, and cybernetics (SMC), pp.  1054–1061. IEEE, 2018.
  47. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2818–2826, 2016.
  48. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14453–14463, 2023a.
  49. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs. arXiv preprint arXiv:2306.11536, 2023b.
  50. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
  51. Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings, pp.  1–11, 2023.
  52. Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex. bioRxiv, 2023. URL https://api.semanticscholar.org/CorpusID:259858715.
  53. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  54. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12):4136–4160, 2018.
  55. Dream: Visual decoding from reversing human visual system. arXiv preprint arXiv:2310.02265, 2023.
  56. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3836–3847, 2023.
  57. Uni-controlnet: All-in-one control to text-to-image diffusion models. arXiv preprint arXiv:2305.16322, 2023.
  58. Layoutdiffusion: Controllable diffusion model for layout-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22490–22499, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Haoyu Li (56 papers)
  2. Hao Wu (623 papers)
  3. Badong Chen (83 papers)
Citations (2)

Summary

NeuralDiffuser: Enhancing fMRI Reconstruction through Primary Visual Feature-Guided Diffusion

Introduction

In the domain of neuroscience and brain-computer interface technology, the endeavor to decode and reconstruct visual stimuli from brain activities, notably from functional Magnetic Resonance Imaging (fMRI) signals, stands as a frontier of scientific exploration. The process, however, encounters substantial hurdles, particularly in bridging the modality gap that exists between the multi-dimensional nature of fMRI data and the imaged visual stimuli. Traditional methods often yield reconstructions that are ambiguous and lack cohesion in semantic details. The advent of Latent Diffusion Models (LDMs) has opened new avenues for addressing these challenges, yet they often produce varied outcomes under identical conditions and struggle with the faithful rendition of details.

Methodology

This paper introduces NeuralDiffuser, a method that incorporates primary visual feature guidance within the framework of LDMs to significantly enhance the fidelity of reconstructed images from fMRI data. The NeuralDiffuser architecture aims to meld neuroscientific principles with computational intelligence, drawing insights from both the top-down and bottom-up processes of visual cognition. This approach emphasizes the importance of pretrained knowledge from the top-down perspective, guiding the generative capabilities of the LDM, while also highlighting the necessity of fine-grained, detail-oriented perception from the bottom-up viewpoint for accurate image reconstruction.

fMRI Embeddings Decoding

NeuralDiffuser operates by decoding fMRI embeddings into semantic conditions and initial latents, leveraging neural networks to map these embeddings to visual features that include but are not limited to structure, texture, and color. A novel aspect of this method is its guidance strategy, which ensures consistency in the reconstructed images, a critical factor given the inherent diversity in generative models’ outputs.

Primary Visual Feature Guidance

The cornerstone of NeuralDiffuser lies in its use of primary visual feature guidance, drawn from multiple layers of a CLIP visual encoder. This guidance serves as detail cues for the LDM, significantly improving the semantic fidelity and detail accuracy of the reconstructed images. Employing a well-designed guidance strategy alleviates the common issue of variabilities in generative outcomes, fostering consistency across repeated reconstructions.

Results and Evaluation

NeuralDiffuser demonstrates state-of-the-art performance on the Natural Scenes Dataset (NSD), illustrating a marked improvement in the faithful rendering of details and consistency of results compared to existing methods. The algorithm’s superiority is further substantiated through rigorous evaluation, employing metrics that assess both low-level and high-level image features, including structure similarity, pixel correlation, and semantic relevance.

Implications and Future Directions

The innovation introduced by NeuralDiffuser has significant implications, not only in bridging the gap between neural activations and perceived visual stimuli but also in enhancing the capabilities of brain-computer interfaces. By providing a more accurate and consistent reconstruction of visual experiences from fMRI data, NeuralDiffuser paves the way for advances in how we understand and interact with the intricate workings of the human brain.

Looking forward, the primary visual feature guidance principle could be extended to other reconstruction tasks beyond fMRI, offering a versatile tool for navigating the complex interplay between biological data and computationally generated visual content. The novel guidance strategy developed herein presents a promising avenue for research into controlling generative models, ensuring that the diversity of outcomes does not impede the accuracy and reliability of reconstructions.

X Twitter Logo Streamline Icon: https://streamlinehq.com