Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities (2305.17214v4)

Published 26 May 2023 in cs.CV

Abstract: Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. T. Horikawa and Y. Kamitani, “Generic decoding of seen and imagined objects using hierarchical visual features,” Nature Communications, vol. 8, 2015.
  2. K. Uğurbil, J. Xu, E. J. Auerbach, S. Moeller, A. T. Vu, J. M. Duarte-Carvajalino, C. Lenglet, X. Wu, S. Schmitter, P. F. Van de Moortele et al., “Pushing spatial and temporal resolution for functional and diffusion mri in the human connectome project,” Neuroimage, vol. 80, pp. 80–104, 2013.
  3. E. W. Contini, S. G. Wardle, and T. A. Carlson, “Decoding the time-course of object recognition in the human brain: From visual features to categorical decisions,” Neuropsychologia, vol. 105, pp. 165–176, 2017.
  4. Z. Ren, J. Li, X. Xue, X. Li, F. Yang, Z. Jiao, and X. Gao, “Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning,” NeuroImage, vol. 228, 2021.
  5. Y. Takagi and S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity,” bioRxiv, pp. 2022–11, 2022.
  6. J. Sun, M. Li, and M.-F. Moens, “Decoding realistic images from brain activity with contrastive self-supervision and latent diffusion,” in Proceedings of the 26th European Conference on Artificial Intelligence (ECAI 2023), 2023, 2023.
  7. Z. Chen, J. Qing, T. Xiang, W. L. Yue, and J. H. Zhou, “Seeing beyond the brain: Masked modeling conditioned diffusion model for human vision decoding,” in arXiv, November 2022. [Online]. Available: https://arxiv.org/abs/2211.06956
  8. J. Sun and M.-F. Moens, “Fine-tuned vs. prompt-tuned supervised representations: Which better account for brain language representations?” in Proceedings of IJCAI, Macau, China, 2023.
  9. M. Mozafari, L. Reddy, and R. van Rullen, “Reconstructing natural scenes from fmri patterns using bigbigan,” 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2020.
  10. T. T. Liu, “Noise contributions to the fmri signal: An overview,” NeuroImage, vol. 143, pp. 141–151, 2016.
  11. J. C. Brooks, O. K. Faull, K. T. Pattinson, and M. Jenkinson, “Physiological noise in brainstem fmri,” Frontiers in human neuroscience, vol. 7, p. 623, 2013.
  12. I. I. Groen, E. H. Silson, and C. I. Baker, “Contributions of low-and high-level properties to neural processing of visual scenes in the human brain,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 372, no. 1714, p. 20160102, 2017.
  13. D. J. Kravitz, K. S. Saleem, C. I. Baker, L. G. Ungerleider, and M. Mishkin, “The ventral visual pathway: an expanded neural framework for the processing of object quality,” Trends in cognitive sciences, vol. 17, no. 1, pp. 26–49, 2013.
  14. T. C. Kietzmann, C. J. Spoerer, L. K. Sörensen, R. M. Cichy, O. Hauk, and N. Kriegeskorte, “Recurrence is required to capture the representational dynamics of the human visual system,” Proceedings of the National Academy of Sciences, vol. 116, no. 43, pp. 21 854–21 863, 2019.
  15. J. J. DiCarlo, D. Zoccolan, and N. C. Rust, “How does the brain solve visual object recognition?” Neuron, vol. 73, no. 3, pp. 415–434, 2012.
  16. J. Sun, S. Wang, J. Zhang, and C. Zong, “Neural encoding and decoding with distributed sentence representations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 589–603, 2020.
  17. K. N. Kay, T. Naselaris, R. J. Prenger, and J. L. Gallant, “Identifying natural images from human brain activity,” Nature, vol. 452, pp. 352–355, 2008.
  18. T. Horikawa, M. Tamaki, Y. Miyawaki, and Y. Kamitani, “Neural decoding of visual imagery during sleep,” Science, vol. 340, pp. 639 – 642, 2013.
  19. K. Seeliger, U. Güçlü, L. Ambrogioni, Y. Güçlütürk, and M. van Gerven, “Generative adversarial networks for reconstructing natural images from brain activity,” NeuroImage, vol. 181, pp. 775–785, 2017.
  20. Y. Zhang, T. Bu, J. Zhang, S. Tang, Z. Yu, J. K. Liu, and T. Huang, “Decoding pixel-level image features from two-photon calcium signals of macaque visual cortex,” Neural Computation, vol. 34, pp. 1369–1397, 2022.
  21. T. Horikawa, A. S. Cowen, D. Keltner, and Y. Kamitani, “The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions,” iScience, vol. 23, 2019.
  22. A. G. Huth, T. Lee, S. Nishimoto, N. Y. Bilenko, A. T. Vu, and J. L. Gallant, “Decoding the semantic content of natural movies from human brain activity,” Frontiers in Systems Neuroscience, vol. 10, 2016.
  23. T. Fang, Y. Qi, and G. Pan, “Reconstructing perceptive images from brain activity by shape-semantic gan,” ArXiv, vol. abs/2101.12083, 2021.
  24. M. Ferrante, T. Boccato, and N. Toschi, “Semantic brain decoding: from fmri to conceptually similar image reconstruction of visual stimuli,” arXiv preprint arXiv:2212.06726, 2022.
  25. F. Ozcelik and R. VanRullen, “Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion,” arXiv preprint arXiv:2303.05334, 2023.
  26. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
  27. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 684–10 695.
  28. G. Shen, K. Dwivedi, K. Majima, T. Horikawa, and Y. Kamitani, “End-to-end deep image reconstruction from human brain activity,” Frontiers in computational neuroscience, vol. 13, p. 21, 2019.
  29. G. Shen, T. Horikawa, K. Majima, and Y. Kamitani, “Deep image reconstruction from human brain activity,” PLoS Computational Biology, vol. 15, 2017.
  30. X. Qian, Y. Wang, Y. Fu, X. Xue, and J. Feng, “Semantic neural decoding via cross-modal generation,” arXiv preprint arXiv:2303.14730, 2023.
  31. S. Lin, T. Sprague, and A. K. Singh, “Mind reader: Reconstructing complex images from brain activities,” arXiv preprint arXiv:2210.01769, 2022.
  32. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  33. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning.   PMLR, 2015, pp. 2256–2265.
  34. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8162–8171.
  35. M. Li, T. Qu, W. Sun, and M.-F. Moens, “Alleviating exposure bias in diffusion models through sampling with shifted time steps,” arXiv preprint arXiv:2305.15583, 2023.
  36. F. Bao, C. Li, J. Zhu, and B. Zhang, “Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models,” arXiv preprint arXiv:2201.06503, 2022.
  37. P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 873–12 883.
  38. A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” Advances in neural information processing systems, vol. 30, 2017.
  39. C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes et al., “Photorealistic text-to-image diffusion models with deep language understanding,” arXiv preprint arXiv:2205.11487, 2022.
  40. G. Kim, T. Kwon, and J. C. Ye, “Diffusionclip: Text-guided diffusion models for robust image manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2426–2435.
  41. N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” arXiv preprint arXiv:2208.12242, 2022.
  42. L. Zhang and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” arXiv preprint arXiv:2302.05543, 2023.
  43. C. Mou, X. Wang, L. Xie, J. Zhang, Z. Qi, Y. Shan, and X. Qie, “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models,” arXiv preprint arXiv:2302.08453, 2023.
  44. T. B. Parrish, D. R. Gitelman, K. S. LaBar, and M.-M. Mesulam, “Impact of signal-to-noise on functional mri,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 44, no. 6, pp. 925–932, 2000.
  45. G. K. Aguirre, R. Datta, N. C. Benson, S. Prasad, S. G. Jacobson, A. V. Cideciyan, H. Bridge, K. E. Watkins, O. H. Butt, A. S. Dain et al., “Patterns of individual variation in visual pathway structure and function in the sighted and blind,” PLoS One, vol. 11, no. 11, p. e0164677, 2016.
  46. J. Sun, S. Wang, J. Zhang, and C. Zong, “Towards sentence-level brain decoding with distributed representations,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 7047–7054.
  47. X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9640–9649.
  48. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  49. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18.   Springer, 2015, pp. 234–241.
  50. P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794, 2021.
  51. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 000–16 009.
  52. F. Ozcelik, B. Choksi, M. Mozafari, L. Reddy, and R. VanRullen, “Reconstruction of perceived images from fmri patterns and semantic brain exploration using instance-conditioned gans,” in 2022 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2022, pp. 1–8.
  53. G. Gaziv, R. Beliy, N. Granot, A. Hoogi, F. Strappini, T. Golan, and M. Irani, “Self-supervised natural image reconstruction and large-scale semantic classification from brain activity,” NeuroImage, vol. 254, p. 119121, 2022.
  54. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  55. Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “Simmim: A simple framework for masked image modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9653–9663.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jingyuan Sun (14 papers)
  2. Mingxiao Li (48 papers)
  3. Zijiao Chen (8 papers)
  4. Yunhao Zhang (19 papers)
  5. Shaonan Wang (19 papers)
  6. Marie-Francine Moens (102 papers)
Citations (25)