Spurious reconstruction from brain activity (2405.10078v5)
Abstract: Advances in brain decoding, particularly visual image reconstruction, have sparked discussions about the societal implications and ethical considerations of neurotechnology. As these methods aim to recover visual experiences from brain activity and achieve prediction beyond training samples (zero-shot prediction), it is crucial to assess their capabilities and limitations to inform public expectations and regulations. Our case study of recent text-guided reconstruction methods, which leverage a large-scale dataset (Natural Scene Dataset, NSD) and text-to-image diffusion models, reveals limitations in their generalizability. We found poor performance when applying these methods to a different dataset designed to prevent category overlaps between training and test sets. UMAP visualization of the text features with NSD images showed a limited diversity of semantic and visual clusters, with overlap between training and test sets. Formal analysis and simulations demonstrated that clustered training samples can lead to "output dimension collapse," restricting predictable output feature dimensions. Simulations further showed that diversifying the training set improved generalizability. However, text features alone are insufficient for mapping to the visual space. We argue that recent realistic reconstructions may primarily be a blend of classification into trained categories and generation of inauthentic images through text-to-image diffusion (hallucination). Diverse datasets and compositional representations spanning the image space are essential for genuine zero-shot prediction. Interdisciplinary discussions grounded in understanding the current capabilities and limitations, as well as ethical considerations, of the technology are crucial for its responsible development.
- A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25:116–126, 2022.
- DreamDiffusion: Generating high-quality images from brain EEG signals. arXiv preprint arXiv:2306.16934, 2023.
- From voxels to pixels and back: self-supervision in natural-image reconstruction from fmri. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, volume 585. Curran Associates Inc., 2019.
- Brain decoding: toward real-time reconstruction of visual perception. arXiv preprint arXiv:2310.19812, 2024.
- Decoding and reconstructing color from responses in human visual cortex. Journal of Neuroscience, 29:13992–14003, 2009.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7:430–441, 2023.
- Seeing beyond the brain: Masked modeling conditioned diffusion model for human vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023a.
- Cinematic mindscapes: High-quality video reconstruction from brain activity. arXiv preprint arXiv:2306.16934, 2023b.
- Reconstructing visual illusory experiences from human brain activity. Science Advances, 9:eadj3906, 2023.
- Rewon Child. Very deep VAEs generalize autoregressive models and can outperform them on images. In International Conference on Learning Representations, 2021.
- Devin Coldewey. Google’s best Gemini demo was faked, 2023. URL https://techcrunch.com/2023/12/07/googles-best-gemini-demo-was-faked/.
- What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines? bioRxiv preprint bioRxiv:10.1101/2022.03.28.485868, 2023.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009.
- Brain2Music: Reconstructing music from human brain activity. arXiv preprint arXiv:2307.11078, 2023.
- Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:2567–2581, 2020.
- Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems, volume 29, pages 658–666. Curran Associates, Inc., 2016.
- Brain captioning: Decoding human brain activity into images and text. arXiv preprint arXiv:2305.11560, 2023.
- Dreamsim: Learning new dimensions of human visual similarity using synthetic data. arXiv preprint arXiv:2306.09344, 2023.
- Self-supervised natural image reconstruction and large-scale semantic classification from brain activity. NeuroImage, 253:119–121, 2022.
- The algonauts project 2023 challenge: How the human brain makes sense of natural scenes. arXiv preprint arXiv:2301.03198, 2023.
- Direct fit to nature: An evolutionary perspective on biological and artificial neural networks. Neuron, 105:416–434, 2020.
- A Common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron, 72:404–416, 2011.
- Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425–2430, 2001.
- THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, 2023.
- Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.
- Reducing the dimensionality of data with neural networks. Science, 313:504–507, 2006.
- Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications, 8:1–15, 2017.
- Attention modulates neural representation to render reconstructions according to subjective appearance. Communications Biology, 5:34, 2022.
- Neural decoding of visual imagery during sleep. Science, 340:639–642, 2013.
- Decoding the visual and subjective contents of the human brain. Nature Neuroscience, 8:679–85, 2005.
- Brain2image: Converting brain signals into images. In Proceedings of the 25th ACM International Conference on Multimedia, pages 1809–1817. Association for Computing Machinery, 2017.
- Identifying natural images from human brain activity. Identifying natural images from human brain activity, 452:352–355, 2008.
- Auto-encoding variational bayes. In 2nd International Conference on Learning Representations, 2014.
- Mental image reconstruction from human brain activity: Neural decoding of mental imagery via deep neural network-based Bayesian estimation. Neural Networks, 170:349–363, 2024.
- Simplicity and validity in infant research. Cognitive Development, 63:101213, 2022.
- Interpreting encoding and decoding models. Current Opinion in Neurobiology, 55:167–179, 2019.
- Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253, 2017.
- Seeing through the brain: Image reconstruction of visual perception from human brain signals. arXiv preprint arXiv:2308.02510, 2023.
- Zero-data learning of new tasks. In Proceedings of the 23rd national Conference on Artificial Intelligence, volume 2, pages 646–651. National converence of Artificial Intelligence, 2008.
- Training on the test set? An analysis of Spampinato et al. [31]. arXiv preprint arXiv:1812.07697, 2018.
- Mind reader: Reconstructing complex images from brain activities. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2022.
- Microsoft COCO: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755. Springer International Publishing, 2014.
- Understanding deep image representations by inverting them. In 2015 IEEE Conference on Computer Vision and Pattern Recognition, pages 5188–5196. IEEE, 2015.
- Position information encoded by population activity in hierarchical visual areas. eNeuro, 4:224–231, 2017.
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426, 2018.
- Artificial intelligence and illusions of understanding in scientific research. Nature, 627:49–58, 2024.
- Predicting human brain activity associated with the meanings of nouns. Science, 320:1191–1195, 2008.
- Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60:915–929, 2008.
- Reconstructing natural scenes from fMRI patterns using BigBiGAN. In International Joint Conference on Neural Networks, pages 1–8. IEEE, 2020.
- Correspondence of categorical and feature-based representations of music in the human brain. Brain and Behavior, 11:e01936, 2021.
- Extensive sampling for complete models of individual brains. Current Opinion in Behavioral Sciences, 40:45–51, 2021.
- Keep it real: rethinking the primacy of experimental control in cognitive neuroscience. NeuroImage, 222:117254, 2020.
- Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21:1641–1646, 2011.
- Decoding and reconstruction of surface materials from EEG. arXiv preprint arXiv:2309.05922, 2024.
- Natural scene reconstruction from fMRI signals using generative latent diffusion. Scientific Reports, 13:156–166, 2023.
- Reconstruction of perceived images from fMRI patterns and semantic brain exploration using instance-conditioned GANs. In International Joint Conference on Neural Networks, 2022.
- Zero-shot learning with semantic output codes. In Advances in Neural Information Processing Systems, volume 22, pages 1410–1418. Curran Associates, Inc., 2009.
- Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11:e77599, 2022.
- BigGAN-based bayesian reconstruction of natural images from human brain activity. Neuroscience, 444:92–105, 2020.
- Jon Raasch. ’Mind reading,’ restoring vision to the blind and giving the deaf hearing could be possible: Neurosurgeon, 2023. URL https://www.foxnews.com/us/mind-reading-restoring-vision-blind-giving-deaf-hearing-possible-neurosurgeon.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 8748–8763. PMLR, 2021.
- Zero-Shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 8821–8831. PMLR, 2021.
- A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922, 2023.
- Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning. NeuroImage, 226:117593, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Discovering modular solution that generalize compositionality. In The Twelfth International Conference on Learning Representations, 2024.
- LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. In Advances in Neural Information Processing Systems, volume 36, pages 24705–24728. Curran Associates, Inc., 2023.
- Generative adversarial networks for reconstructing natural images from brain activity. NeuroImage, 181:775–785, 2018.
- End-to-end deep image reconstruction from human brain activity. Frontiers in Computational Neuroscience, 13:13–21, 2019a.
- Deep image reconstruction from human brain activity. PLOS Computational Biology, 15:e1006633, 2019b.
- Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, 2015.
- James Somers. The science of mind reading, 2021. URL https://www.newyorker.com/magazine/2021/12/06/the-science-of-mind-reading.
- Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11:543–545, 2008.
- Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus. The Journal of Neuroscience, 19:8036–8042, 1999.
- Yu Takagi and Shinji Nishimoto. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs. arXiv preprint arXiv:2306.11536, 2023a.
- Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023b.
- Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26:858–866, 2023.
- Deep image prior. arXiv preprint arXiv:1711.10925, 2017.
- UNESCO. Unveiling the neurotechnology landscape. Scientific advancements innovations and major trends. UNESCO, 2023. URL https://unesdoc.unesco.org/ark:/48223/pf0000386137.
- Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30, pages 6306–6315. Curran Associates, Inc., 2017.
- The human connectome project: A data acquisition perspective. NeuroImage, 62:2222–2231, 2012.
- Inter-individual and inter-site neural code conversion and image reconstruction without shared stimuli. arXiv preprint arXiv:2403.11517, 2024.
- Oliver Whang. A.I. is getting better at mind-reading, 2023. URL https://www.nytimes.com/2023/05/01/science/ai-speech-language.html.
- Alljoined – A dataset for EEG-to-image decoding. arXiv preprint arXiv:2404.05553, 2024.
- Versatile diffusion: Text, images and variations all in one diffusion model. arXiv preprint arXiv: 2211.08332, 2022.
- Inter-subject neural code converter for visual image representation. NeuroImage, 113:289–297, 2015.
- Neuronal tuning: To sharpen or broaden? Neural Computation, 11:75–84, 1999.
- Clip-mused: Clip-guided multi-subject visual neural information semantic decoding. ArXiv preprint arXiv:2402.08994, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.