MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction (2404.12630v2)
Abstract: Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decoding, which achieves high-quality and rich semantic reconstructions using only 1 hour of fMRI training data benefiting from the phenomena of visual fingerprint in the human visual system and a novel fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the image modality as the intermediate pivot modality to achieve fMRI-to-text alignment, which achieves impressive fMRI-to-text retrieval performance and corrects fMRI-to-image reconstruction with fine-tuned semantics. The results of both qualitative and quantitative analyses demonstrate that MindTuner surpasses state-of-the-art cross-subject visual decoding models on the Natural Scenes Dataset (NSD), whether using training data of 1 hour or 40 hours.
- A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience 25, 1 (2022), 116–126.
- Local optimal transport for functional brain template estimation. In Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26. Springer, 237–248.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems 33 (2020), 9912–9924.
- Shared memories reveal shared structure in neural activity across individuals. Nature neuroscience 20, 1 (2017), 115–125.
- Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22710–22720.
- Decoding visual neural representations by multimodal learning of brain-visual-linguistic features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
- Through their eyes: multi-subject Brain Decoding with simple alignment techniques. arXiv preprint arXiv:2309.00627 (2023).
- Functional connectivity in the brain—is it an elusive concept? Neuroscience & Biobehavioral Reviews 28, 8 (2005), 827–836.
- Pycortex: an interactive surface visualizer for fMRI. Frontiers in neuroinformatics 9 (2015), 23.
- Tomoyasu Horikawa and Yukiyasu Kamitani. 2017. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications 8, 1 (2017), 15037.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Learning shared neural manifolds from multi-subject FMRI data. In 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 01–06.
- Mixco: Mix-up contrastive learning for visual representation. arXiv preprint arXiv:2010.06300 (2020).
- Brain-optimized inference improves reconstructions of fMRI brain activity. arXiv preprint arXiv:2312.07705 (2023).
- Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems 35 (2022), 29624–29636.
- David Linden. 2021. Section 3 - Introduction. In fMRI Neurofeedback, Michelle Hampson (Ed.). Academic Press, 161–169. https://doi.org/10.1016/B978-0-12-822421-2.00008-9
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
- MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion. In Proceedings of the 31st ACM International Conference on Multimedia. 5899–5908.
- Weijian Mai and Zhijun Zhang. 2023. Unibrain: Unify image reconstruction and captioning all in one diffusion model from human brain activity. arXiv preprint arXiv:2308.07428 (2023).
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4296–4304.
- BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. In Proceedings of the 31st ACM International Conference on Multimedia (, Ottawa ON, Canada,) (MM ’23). Association for Computing Machinery, New York, NY, USA, 5514–5522. https://doi.org/10.1145/3581783.3611996
- Furkan Ozcelik and Rufin VanRullen. 2023. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334 (2023).
- fmri-pte: A large-scale fmri pretrained transformer encoder for multi-subject brain activity decoding. arXiv preprint arXiv:2311.00342 (2023).
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
- Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors. arXiv preprint arXiv:2305.18274 (2023).
- MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data. arXiv preprint arXiv:2403.11207 (2024).
- Deep image reconstruction from human brain activity. PLoS computational biology 15, 1 (2019), e1006633.
- Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, Vol. 11006. SPIE, 369–386.
- Brain-optimized neural networks learn non-hierarchical models of representation in human visual cortex. bioRxiv (2022), 2022–01.
- High-dimensional geometry of population responses in visual cortex. Nature 571, 7765 (2019), 361–365.
- Yu Takagi and Shinji Nishimoto. 2023. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14453–14463.
- Mingxing Tan and V Le Quoc. [n. d.]. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. arXiv preprint arXiv:1905.11946 ([n. d.]).
- Aligning brain functions boosts the decoding of visual semantics in novel subjects. arXiv preprint arXiv:2312.06467 (2023).
- Brain state decoding for rapid image retrieval. In Proceedings of the 17th ACM International Conference on Multimedia (Beijing, China) (MM ’09). Association for Computing Machinery, New York, NY, USA, 945–954. https://doi.org/10.1145/1631272.1631463
- Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100 (2022).
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Idiosyncratic perception: a link between acuity, perceived position and apparent size. Proceedings of the Royal Society B 287, 1930 (2020), 20200825.
- Dream: Visual decoding from reversing human visual system. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 8226–8235.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
- Decoding Auditory Saliency from FMRI Brain Imaging. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, Florida, USA) (MM ’14). Association for Computing Machinery, New York, NY, USA, 873–876. https://doi.org/10.1145/2647868.2655039
- Zixuan Gong (10 papers)
- Qi Zhang (785 papers)
- Guangyin Bao (8 papers)
- Lei Zhu (280 papers)
- Ke Liu (597 papers)
- Liang Hu (64 papers)
- Duoqian Miao (25 papers)