Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 94 tok/s
GPT OSS 120B 476 tok/s Pro
Kimi K2 190 tok/s Pro
2000 character limit reached

Perceptogram: Reconstructing Visual Percepts from EEG (2404.01250v2)

Published 1 Apr 2024 in q-bio.NC and cs.HC

Abstract: Visual neural decoding from EEG has improved significantly due to diffusion models that can reconstruct high-quality images from decoded latents. While recent works have focused on relatively complex architectures to achieve good reconstruction performance from EEG, less attention has been paid to the source of this information. In this work, we attempt to discover EEG features that represent perceptual and semantic visual categories, using a simple pipeline. Notably, the high temporal resolution of EEG allows us to go beyond static semantic maps as obtained from fMRI. We show (a) Training a simple linear decoder from EEG to CLIP latent space, followed by a frozen pre-trained diffusion model, is sufficient to decode images with state-of-the-art reconstruction performance. (b) Mapping the decoded latents back to EEG using a linear encoder isolates CLIP-relevant EEG spatiotemporal features. (c) By using other latent spaces representing lower-level image features, we obtain similar time-courses of texture/hue-related information. We thus use our framework, Perceptogram, to probe EEG signals at various levels of the visual information hierarchy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Identifying natural images from human brain activity. Nature, 452(7185):352–355, March 2008. ISSN 0028-0836, 1476-4687. doi:10.1038/nature06713.
  2. Versatile diffusion: Text, images and variations all in one diffusion model, 2022. URL https://arxiv.org/abs/2211.08332.
  3. Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453–14463, 2023. doi:10.1109/CVPR52729.2023.01389.
  4. Changes in pattern-evoked responses in man associated with the vertical and horizontal meridians of the visual field. The Journal of Physiology, 208(2):499–513, June 1970. ISSN 0022-3751, 1469-7793. doi:10.1113/jphysiol.1970.sp009134.
  5. Source locations of pattern-specific components of human visual evoked potentials. i. component of striate cortical origin. Experimental Brain Research, 16(1), November 1972. ISSN 0014-4819, 1432-1106. doi:10.1007/BF00233371. URL http://link.springer.com/10.1007/BF00233371.
  6. Spatial dissociation of early and late colour evoked components. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 71(2):81–88, March 1988. ISSN 01685597. doi:10.1016/0168-5597(88)90009-3.
  7. Impairments in generation of early-stage transient visual evoked potentials to magno- and parvocellular-selective stimuli in schizophrenia. Clinical Neurophysiology, 116(9):2204–2215, September 2005. ISSN 13882457. doi:10.1016/j.clinph.2005.06.013.
  8. Margot J. Taylor. Non-spatial attentional effects on p1. Clinical Neurophysiology, 113(12):1903–1908, December 2002. ISSN 13882457. doi:10.1016/S1388-2457(02)00309-7.
  9. A large and rich eeg dataset for modeling human visual object recognition. NeuroImage, 264:119754, December 2022. ISSN 10538119. doi:10.1016/j.neuroimage.2022.119754.
  10. The perils and pitfalls of block design for eeg classification experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1–1, 2020. ISSN 0162-8828, 2160-9292, 1939-3539. doi:10.1109/TPAMI.2020.2973153.
  11. Visual decoding and reconstruction via eeg embeddings with guided diffusion. (arXiv:2403.07721), March 2024. URL http://arxiv.org/abs/2403.07721. arXiv:2403.07721 [cs, eess, q-bio].
  12. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, September 2023. ISSN 2045-2322. doi:10.1038/s41598-023-42891-8.
  13. Rewon Child. Very deep vaes generalize autoregressive models and can outperform them on images. 2020. doi:10.48550/ARXIV.2011.10650. URL https://arxiv.org/abs/2011.10650.
  14. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1):116–126, January 2022. ISSN 1097-6256, 1546-1726. doi:10.1038/s41593-021-00962-x.
  15. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. doi:10.1109/TIP.2003.819861.
  16. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, may 2017. ISSN 0001-0782. doi:10.1145/3065386. URL https://doi.org/10.1145/3065386.
  17. Rethinking the inception architecture for computer vision, 2015. URL https://arxiv.org/abs/1512.00567.
  18. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020.
  19. Efficientnet: Rethinking model scaling for convolutional neural networks. 2019. doi:10.48550/ARXIV.1905.11946. URL https://arxiv.org/abs/1905.11946.
  20. Unsupervised learning of visual features by contrasting cluster assignments, 2020. URL https://arxiv.org/abs/2006.09882.
  21. Brain decoding: Toward real-time reconstruction of visual perception. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
  22. Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors. (arXiv:2305.18274), October 2023. URL http://arxiv.org/abs/2305.18274. arXiv:2305.18274 [cs, q-bio].
  23. MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7(267):1–13, 2013. doi:10.3389/fnins.2013.00267.
  24. Things-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, February 2023. ISSN 2050-084X. doi:10.7554/eLife.82580.
  25. Umap: Uniform manifold approximation and projection for dimension reduction, 2018. URL https://arxiv.org/abs/1802.03426.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents a two-stage latent diffusion pipeline for reconstructing images from EEG signals, adapting a method previously used for fMRI.
  • The study found that EEG-based reconstructions, while less sharp than fMRI, capture significant visual information, particularly for categories like animals and food.
  • Future work involves refining model components and exploring reconstruction from naturalistic or video stimuli using EEG.

Image Reconstruction from EEG Using Latent Diffusion: A Technical Overview

The paper presented explores the intersection of neuroimaging and image generation by applying latent diffusion models to reconstruct images from electroencephalography (EEG) data. This approach is inspired by the previously established methodologies in functional magnetic resonance imaging (fMRI), which have demonstrated the viability of generating visually interpretable images from brain activity signals. The primary focus of this paper is to adapt and evaluate the application of diffusion-based image reconstruction in the context of EEG, presenting an intriguing baseline for subsequent research in the domain.

Methodological Approach

The authors employ a two-stage image reconstruction pipeline, initially developed for fMRI, to decode visual stimuli from EEG signals. At the core of the methodology lies the use of a Very Deep Variational Autoencoder (VDVAE) to map EEG signals onto a latent space. Subsequent integration with a Versatile Diffusion model via Contrastive Language-Image Pre-training (CLIP) embeddings enables the generation of image reconstructions that capture both low-level visual features and high-level semantics. The reconstructed images are assessed using a suite of performance metrics, including Pixel-level Correlation (PixCorr), Structural Similarity Index Metric (SSIM), and various layers within AlexNet and InceptionV3, among others. These measures provide a comprehensive evaluation framework, distinguishing between low-level and high-level visual features.

The paper utilizes the THINGS-EEG2 dataset, comprising rapid-serial visual presentation (RSVP) of images, which is advantageous for enhancing signal-to-noise ratio but limited by the brief processing intervals of each stimulus. This introduces challenges when deciphering EEG’s contributions in terms of spatial resolution, a well-known limitation in contrast to the high spatial accuracy typical of fMRI.

Results and Implications

Notably, the paper reveals that image reconstructions from EEG, while not rivaling the quality achieved through fMRI, encapsulate a notable amount of visual information. This is especially true for specific categories such as land animals and food, supporting prior assertions on EEG's sensitivity to these categories due to early and distinct visual-evoked potentials (VEPs). The investigation also proposes improvements, suggesting a possible extension of stimulus duration to tease out later-stage cognitive processing that might enrich image reconstruction fidelity.

The research further extends into ablation studies to discern the contributions of various components (e.g., AutoKL, CLIP-Vision, CLIP-Text) within the reconstruction pipeline. These findings emphasize that a complete integration of these components is necessary for optimal reconstruction performance, indicating future work could benefit from nuanced model refinement.

Future Directions

In discussing future applications, the authors consider broadening the framework to accommodate naturalistic visual stimuli, potentially integrating EEG with technologies like rapid shutter goggles to synchronize the visual input format with that required by the reconstruction model. Additionally, exploring video reconstruction from EEG represents a promising avenue given the advancing capabilities of video generative models and the temporal resolution strengths of EEG.

The paper presents a foundational exploration of EEG-based image reconstruction, offering valuable insights into the cognitive processes associated with visual perception and the possibilities of extending such methodologies to practical brain-computer interface (BCI) applications. Researchers in the field are encouraged to leverage this baseline and explore more sophisticated algorithms and data paradigms to enhance the resolution and applicability of EEG-based visual decoding.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com