EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior (2410.20981v3)
Abstract: Electroencephalography (EEG)-based visual perception reconstruction has become an important area of research. Neuroscientific studies indicate that humans can decode imagined 3D objects by perceiving or imagining various visual information, such as color, shape, and rotation. Existing EEG-based visual decoding methods typically focus only on the reconstruction of 2D visual stimulus images and face various challenges in generation quality, including inconsistencies in texture, shape, and color between the visual stimuli and the reconstructed images. This paper proposes an EEG-based 3D object reconstruction method with style consistency and diffusion priors. The method consists of an EEG-driven multi-task joint learning stage and an EEG-to-3D diffusion stage. The first stage uses a neural EEG encoder based on regional semantic learning, employing a multi-task joint learning scheme that includes a masked EEG signal recovery task and an EEG based visual classification task. The second stage introduces a latent diffusion model (LDM) fine-tuning strategy with style-conditioned constraints and a neural radiance field (NeRF) optimization strategy. This strategy explicitly embeds semantic- and location-aware latent EEG codes and combines them with visual stimulus maps to fine-tune the LDM. The fine-tuned LDM serves as a diffusion prior, which, combined with the style loss of visual stimuli, is used to optimize NeRF for generating 3D objects. Finally, through experimental validation, we demonstrate that this method can effectively use EEG data to reconstruct 3D objects with style consistency.
- Deep learning human mind for automated visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6809–6817, 2017.
- Jay Hegdé. Time course of visual perception: coarse-to-fine processing and beyond. Progress in neurobiology, 84(4):405–439, 2008.
- Michele Fabre-Thorpe. Visual categorization: accessing abstraction in non–human primates. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1435):1215–1223, 2003.
- Competition and selection during visual processing of natural scenes and objects. Journal of vision, 3(1):8–8, 2003.
- Speed and accuracy of saccadic eye movements: characteristics of impulse variability in the oculomotor system. Journal of Experimental Psychology: Human Perception and Performance, 15(3):529, 1989.
- Keith Rayner. Eye movement latencies for parafoveally presented words. Bulletin of the Psychonomic Society, 11(1):13–16, 1978.
- Lin Chen. Topological structure in visual perception. Science, 218(4573):699–700, 1982.
- Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22710–22720, June 2023.
- Decoding visual neural representations by multimodal learning of brain-visual-linguistic features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10760–10777, 2023.
- Minddiffuser: Controlled image reconstruction from human brain activity with semantic and structural diffusion. In Proceedings of the 31st ACM International Conference on Multimedia, pages 5899–5908, 2023.
- Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023.
- Brain2image: Converting brain signals into images. In Proceedings of the 25th ACM international conference on Multimedia, pages 1809–1817, 2017.
- DreamDiffusion: Generating high-quality images from brain EEG signals. In European Conference on Computer Vision (ECCV). Springer, 2024.
- Self-supervised cross-modal visual retrieval from brain activities. Pattern Recognition, 145:109915, 2024.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF international conference on computer vision, pages 22819–22829, 2023.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022.
- Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18603–18613, 2022.
- Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. Advances in Neural Information Processing Systems, 35:33999–34011, 2022.
- A large and rich EEG dataset for modeling human visual object recognition. NeuroImage, 264:119754, 2022.
- Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
- A closed-form solution to photorealistic image stylization. In Proceedings of the European conference on computer vision (ECCV), pages 453–468, 2018.
- Pca-based knowledge distillation towards lightweight and content-style balanced photorealistic style transfer models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7844–7853, 2022.
- Rapid categorization of natural images by rhesus monkeys. Neuroreport, 9(2):303–308, 1998.
- Attila Korik et al. Real-time feedback improves imagined 3d primitive object classification from eeg. Brain-Computer Interfaces, pages 1–25, 2024.
- Yukako Yamane et al. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature neuroscience, 11(11):1352–1360, 2008.
- Hierarchical representation for chromatic processing across macaque v1, v2, and v4. Neuron, 108(3):538–550, 2020.
- Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science, 364(6447):1275–1279, 2019.
- Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Pca-based knowledge distillation towards lightweight and content-style balanced photorealistic style transfer models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7844–7853, June 2022.
- A closed-form solution to photorealistic image stylization. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
- Learning robust deep visual representations from EEG brain recordings. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7553–7562, 2024.
- Decoding visual brain representations from electroencephalography through knowledge distillation and latent diffusion models. Computers in Biology and Medicine, page 108701, 2024.
- Brain2image: Converting brain signals into images. In 25th ACM International Conference on Multimedia, page 1809–1817, 2017.
- NeuroVision: perceived image regeneration using cProGAN. Neural Computing and Applications, page 5979–5991, 2022.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV), pages 768–783, 2018.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.