Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Natural scene reconstruction from fMRI signals using generative latent diffusion (2303.05334v2)

Published 9 Mar 2023 in cs.CV, cs.AI, and q-bio.NC

Abstract: In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called Brain-Diffuser''. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compellingROI-optimal'' scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Thirion, B. et al. Inverse retinotopy: inferring the visual content of images from brain activation patterns. \JournalTitleNeuroimage 33, 1104–1116 (2006).
  2. Decoding the visual and subjective contents of the human brain. \JournalTitleNature neuroscience 8, 679–685 (2005).
  3. Predicting the orientation of invisible stimuli from activity in human primary visual cortex. \JournalTitleNature neuroscience 8, 686–691 (2005).
  4. Haxby, J. V. et al. Distributed and overlapping representations of faces and objects in ventral temporal cortex. \JournalTitleScience 293, 2425–2430 (2001).
  5. Functional magnetic resonance imaging (fmri)“brain reading”: detecting and classifying distributed patterns of fmri activity in human visual cortex. \JournalTitleNeuroimage 19, 261–270 (2003).
  6. Identifying natural images from human brain activity. \JournalTitleNature 452, 352–355 (2008).
  7. Miyawaki, Y. et al. Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. \JournalTitleNeuron 60, 915–929 (2008).
  8. Reconstructing faces from fmri patterns using deep generative neural networks. \JournalTitleCommunications biology 2, 1–10 (2019).
  9. Dado, T. et al. Hyperrealistic neural decoding for reconstructing faces from fmri activations via the gan latent space. \JournalTitleScientific reports 12, 141 (2022).
  10. Deep image reconstruction from human brain activity. \JournalTitlePLoS computational biology 15, e1006633 (2019).
  11. Allen, E. J. et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. \JournalTitleNature neuroscience 25, 116–126 (2022).
  12. Mind reader: Reconstructing complex images from brain activities. In Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K. (eds.) Advances in Neural Information Processing Systems (2022).
  13. Generic decoding of seen and imagined objects using hierarchical visual features. \JournalTitleNature communications 8, 1–15 (2017).
  14. Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255 (Ieee, 2009).
  15. Beliy, R. et al. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri. \JournalTitleAdvances in Neural Information Processing Systems 32 (2019).
  16. Gaziv, G. et al. Self-supervised natural image reconstruction and large-scale semantic classification from brain activity. \JournalTitleNeuroImage 254, 119121 (2022).
  17. Reconstructing natural scenes from fmri patterns using bigbigan. In 2020 International joint conference on neural networks (IJCNN), 1–8 (IEEE, 2020).
  18. Ren, Z. et al. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning. \JournalTitleNeuroImage 228, 117602 (2021).
  19. Reconstruction of perceived images from fmri patterns and semantic brain exploration using instance-conditioned gans. In 2022 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE, 2022).
  20. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22710–22720 (2023).
  21. Lin, T.-Y. et al. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740–755 (Springer, 2014).
  22. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14453–14463 (2023).
  23. Decoding natural image stimuli from fMRI data with a surface-based convolutional network. In Medical Imaging with Deep Learning (2023).
  24. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695 (2022).
  25. Hierarchical text-conditional image generation with clip latents. \JournalTitlearXiv preprint arXiv:2204.06125 (2022).
  26. Nichol, A. Q. et al. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, 16784–16804 (PMLR, 2022).
  27. Saharia, C. et al. Photorealistic text-to-image diffusion models with deep language understanding. In Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K. (eds.) Advances in Neural Information Processing Systems (2022).
  28. Versatile diffusion: Text, images and variations all in one diffusion model. \JournalTitlearXiv preprint arXiv:2211.08332 (2022).
  29. Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763 (PMLR, 2021).
  30. Child, R. Very deep {vae}s generalize autoregressive models and can outperform them on images. In International Conference on Learning Representations (2021).
  31. Auto-encoding variational bayes. \JournalTitlearXiv preprint arXiv:1312.6114 (2013).
  32. Schuhmann, C. et al. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. \JournalTitlearXiv preprint arXiv:2111.02114 (2021).
  33. Image quality assessment: from error visibility to structural similarity. \JournalTitleIEEE transactions on image processing 13, 600–612 (2004).
  34. Imagenet classification with deep convolutional neural networks. \JournalTitleCommunications of the ACM 60, 84–90 (2017).
  35. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826 (2016).
  36. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, 6105–6114 (PMLR, 2019).
  37. Caron, M. et al. Unsupervised learning of visual features by contrasting cluster assignments. \JournalTitleAdvances in Neural Information Processing Systems 33, 9912–9924 (2020).
  38. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. \JournalTitleThe Journal of physiology 160, 106 (1962).
  39. Visual properties of neurons in inferotemporal cortex of the macaque. \JournalTitleJournal of neurophysiology 35, 96–111 (1972).
  40. Visual neurones responsive to faces in the monkey temporal cortex. \JournalTitleExperimental brain research 47, 329–342 (1982).
  41. Selectivity for polar, hyperbolic, and cartesian gratings in macaque visual cortex. \JournalTitleScience 259, 100–103 (1993).
  42. Neural mechanisms of form and motion processing in the primate visual system. \JournalTitleNeuron 13, 1–10 (1994).
  43. The fusiform face area: a module in human extrastriate cortex specialized for face perception. \JournalTitleJournal of neuroscience 17, 4302–4311 (1997).
  44. A cortical representation of the local visual environment. \JournalTitleNature 392, 598–601 (1998).
  45. Comparative mapping of higher visual areas in monkeys and humans. \JournalTitleTrends in cognitive sciences 8, 315–324 (2004).
  46. Neural population control via deep image synthesis. \JournalTitleScience 364, eaav9436 (2019).
  47. Computational models of category-selective brain regions enable high-throughput tests of selectivity. \JournalTitleNature communications 12, 5540 (2021).
  48. Gu, Z. et al. Neurogen: activation optimized image synthesis for discovery neuroscience. \JournalTitleNeuroImage 247, 118812 (2022).
  49. Macaques recognize features in synthetic images derived from ventral stream neurons. \JournalTitleProceedings of the National Academy of Sciences 120, e2213034120 (2023).
  50. Wang, C. et al. Reconstructing rapid natural vision with fmri-conditional video generative adversarial network. \JournalTitleCerebral Cortex 32, 4502–4511 (2022).
  51. A penny for your (visual) thoughts: Self-supervised reconstruction of natural movies from brain activity. \JournalTitlearXiv preprint arXiv:2206.03544 (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Furkan Ozcelik (5 papers)
  2. Rufin VanRullen (32 papers)
Citations (60)

Summary

Natural Scene Reconstruction from fMRI Signals Using Generative Latent Diffusion

The paper presents a novel methodology for reconstructing visual images from fMRI brain signals, leveraging the capabilities of modern generative models. The challenge of recreating perceived natural scenes from neural data combines computational neuroscience and advanced AI, requiring sophisticated approaches to capture both the semantic and structural properties of complex scenes. This paper introduces a two-stage model named "Brain-Diffuser" to tackle these challenges using latent diffusion models.

The authors propose utilizing a combination of a Very Deep Variational Autoencoder (VDVAE) and Versatile Diffusion Model (VD) to process and translate fMRI signals into coherent image reconstructions. The first stage of the framework, based on VDVAE, focuses on capturing low-level image features and structural layout. It achieves this by regressing fMRI patterns into the latent space of VDVAE, effectively generating a rough approximation or an "initial guess" of the visual input observed by subjects.

The subsequent stage employs a latent diffusion model to refine these initial reconstructions. Specifically, the Versatile Diffusion model incorporates inputs from both visual data and corresponding text features (captured by the CLIP model) to guide the generation of high-fidelity, semantically meaningful images. This dual-modality conditioning enhances the framework's ability to align the semantic content of reconstructions with true perceived images, as evidenced by experiments using the particularly challenging Natural Scenes Dataset (NSD).

Qualitative and quantitative assessments manifest substantial improvement over previous approaches in reconstructing intricate scenes. The paper reveals both superior low-level fidelity and high-level semantic alignment by conducting comparisons against other state-of-the-art models. Notably, this success is attributed to the pre-trained capabilities of the Generative Latent Diffusion Models and the intelligent integration of multimodal cues from the CLIP encoder.

Beyond its application, this method introduces potential advancements in the neurological and psychological domains. By associating specific regional brain activity with particular elements of visual reconstruction, the paper offers insights into the functional and spatial organization of the visual cortex. Additionally, speculative extensions of this work might aim to decode dynamic sequences, leveraging motion picture stimuli to approach a more temporal understanding of brain signal processing.

This framework provides a promising avenue for non-invasive brain-computer interface development, where real-time visualizations of thought patterns may become feasible. As generative models continue to advance, they will likely offer greater accuracy and applicability in interpreting complex biological data sets, including multi-dimensional neural signals. The interaction between such advanced AI techniques and cognitive neuroscience could eventually illuminate new aspects of human perception and open up new applications in both scientific research and technical innovation.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com