Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction (2312.14871v3)

Published 22 Dec 2023 in cs.CV and cs.AI

Abstract: Analyzing and reconstructing visual stimuli from brain signals effectively advances the understanding of human visual system. However, the EEG signals are complex and contain significant noise. This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG, such as difficulties in aligning EEG embeddings with the fine-grained semantic information and a heavy reliance on additional large self-collected dataset for training. To address these challenges, we propose a novel approach called BrainVis. Firstly, we divide the EEG signals into various units and apply a self-supervised approach on them to obtain EEG time-domain features, in an attempt to ease the training difficulty. Additionally, we also propose to utilize the frequency-domain features to enhance the EEG representations. Then, we simultaneously align EEG time-frequency embeddings with the interpolation of the coarse and fine-grained semantics in the CLIP space, to highlight the primary visual components and reduce the cross-modal alignment difficulty. Finally, we adopt the cascaded diffusion models to reconstruct images. Using only 10\% training data of the previous work, our proposed BrainVis outperforms state of the arts in both semantic fidelity reconstruction and generation quality. The code is available at https://github.com/RomGai/BrainVis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Hybrid deep learning (hdl)-based brain-computer interface (bci) systems: a systematic review. Brain sciences, 11(1):75, 2021.
  2. Dreamdiffusion: Generating high-quality images from brain eeg signals. arXiv preprint arXiv:2306.16934, 2023.
  3. The role of the superior parietal lobule in lexical processing of sign language: Insights from fmri and tms. Cortex, 135:240–254, 2021.
  4. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri. Advances in Neural Information Processing Systems, 32, 2019.
  5. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023.
  6. Timemae: Self-supervised representations of time series with decoupled masked autoencoders. arXiv preprint arXiv:2303.00320, 2023.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. 2020.
  8. fmri brain decoding and its applications in brain–computer interface: A survey. Brain Sciences, 12(2):228, 2022.
  9. Reconstructing perceptive images from brain activity by shape-semantic gan. Advances in Neural Information Processing Systems, 33:13038–13048, 2020.
  10. Electroencephalography-based auditory attention decoding: Toward neurosteered hearing devices. IEEE Signal Processing Magazine, 38(4):89–102, 2021.
  11. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  12. Umut Güçlü and Marcel AJ van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27):10005–10014, 2015.
  13. Brain–computer interfaces. Neural engineering, pages 131–183, 2020.
  14. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  15. Fabian Hutmacher. Why is there so much more research on vision than on any other sensory modality? Frontiers in psychology, 10:2246, 2019.
  16. Brain2image: Converting brain signals into images. In Proceedings of the 25th ACM international conference on Multimedia, pages 1809–1817, 2017.
  17. Identifying natural images from human brain activity. Nature, 452(7185):352–355, 2008.
  18. A comparative analysis of signal processing and classification methods for different applications based on eeg signals. Biocybernetics and Biomedical Engineering, 40(2):649–690, 2020.
  19. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  20. Artifacts and noise removal for electroencephalogram (eeg): A literature review. In 2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pages 326–332. IEEE, 2018.
  21. Towards voice reconstruction from eeg during imagined speech. arXiv preprint arXiv:2301.07173, 2023.
  22. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
  23. Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems, 35:29624–29636, 2022.
  24. Minddiffuser: Controlled image reconstruction from human brain activity with semantic and structural diffusion. arXiv preprint arXiv:2303.14139, 2023.
  25. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, 2023.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  27. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  28. Imagenet large scale visual recognition challenge. pages 211–252. Springer, 2015.
  29. Improved techniques for training gans. 2016.
  30. Deep image reconstruction from human brain activity. PLoS computational biology, 15(1):e1006633, 2019.
  31. Eeg2image: Image reconstruction from eeg brain signals. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  32. Deep learning human mind for automated visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6809–6817, 2017.
  33. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023a.
  34. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs. arXiv preprint arXiv:2306.11536, 2023b.
  35. Thoughtviz: Visualizing human thoughts using generative adversarial network. In Proceedings of the 26th ACM international conference on Multimedia, pages 950–958, 2018.
  36. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  37. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12):4136–4160, 2018.
  38. An investigation of olfactory-enhanced video on eeg-based emotion recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:1602–1613, 2023.
  39. Towards language-free training for text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17907–17917, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Honghao Fu (18 papers)
  2. Zhiqi Shen (62 papers)
  3. Jing Jih Chin (1 paper)
  4. Hao Wang (1120 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com