Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding (2404.13282v2)
Abstract: Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, the diversity in cortical parcellation and fMRI patterns across individuals has prompted the development of deep learning models tailored to each subject. The personalization limits the broader applicability of brain visual decoding in real-world scenarios. To address this issue, we introduce Wills Aligner, a novel approach designed to achieve multi-subject collaborative brain visual decoding. Wills Aligner begins by aligning the fMRI data from different subjects at the anatomical level. It then employs delicate mixture-of-brain-expert adapters and a meta-learning strategy to account for individual fMRI pattern differences. Additionally, Wills Aligner leverages the semantic relation of visual stimuli to guide the learning of inter-subject commonality, enabling visual decoding for each subject to draw insights from other subjects' data. We rigorously evaluate our Wills Aligner across various visual decoding tasks, including classification, cross-modal retrieval, and image reconstruction. The experimental results demonstrate that Wills Aligner achieves promising performance.
- A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience 25, 1 (2022), 116–126.
- BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations.
- Deep Recurrent Encoder: an end-to-end network to model magnetoencephalography at scale. Neurons, Behavior, Data analysis, and Theory (2022).
- Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 22710–22720.
- Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity. In Advances in Neural Information Processing Systems.
- Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. Neuroimage 53, 1 (2010), 1–15.
- Decoding visual neural representations by multimodal learning of brain-visual-linguistic features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. In International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162). 5547–5569.
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research 23 (2022), 120:1–120:39.
- Through their eyes: multi-subject Brain Decoding with simple alignment techniques. arXiv preprint arXiv:2309.00627 (2023).
- High-resolution intersubject averaging and a coordinate system for the cortical surface. Human brain mapping 8, 4 (1999), 272–284.
- Lite-Mind: Towards Efficient and Versatile Brain Representation Network. CoRR abs/2312.03781 (2023).
- Hyperalignment: Modeling shared information encoded in idiosyncratic cortical topographies. elife 9 (2020), e56601.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Tomoyasu Horikawa and Yukiyasu Kamitani. 2017. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications 8, 1 (2017), 15037.
- Lora: Low-rank adaptation of large language models. International Conference on Learning Representation.
- Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In Proceedings of the 18th ACM International Conference on Multimedia. 451–460.
- Yukiyasu Kamitani and Frank Tong. 2005. Decoding the visual and subjective contents of the human brain. Nature neuroscience 8, 5 (2005), 679–685.
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. In International Conference on Learning Representations. OpenReview.net.
- Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems 35 (2022), 29624–29636.
- Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, Vol. 8693. 740–755.
- David Linden. 2021. Section 3 - Introduction. In fMRI Neurofeedback, Michelle Hampson (Ed.). Academic Press, 161–169. https://doi.org/10.1016/B978-0-12-822421-2.00008-9
- BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI. arXiv preprint arXiv:2302.12971 (2023).
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. International Conference on Learning Representation.
- MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion. In Proceedings of the 31st ACM International Conference on Multimedia. 5899–5908.
- Encoding and decoding in fMRI. Neuroimage 56, 2 (2011), 400–410.
- BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. In Proceedings of the 31st ACM International Conference on Multimedia. 5514–5522.
- Furkan Ozcelik and Rufin VanRullen. 2023. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334 (2023).
- Decoding brain representations by multimodal learning of neural activity and visual features. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 3833–3849.
- Neural Networks for Efficient Bayesian Decoding of Natural Images from Retinal Neurons. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 6434–6445.
- Toward a universal decoder of linguistic meaning from brain activation. Nature communications 9, 1 (2018), 963.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Global Filter Networks for Image Classification. In Advances in Neural Information Processing Systems (NeurIPS).
- GFNet: Global Filter Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 9 (2023), 10960–10973.
- Scaling Vision with Sparse Mixture of Experts. In Advances in Neural Information Processing Systems. 8583–8595.
- Mind reading and writing: the future of neurotechnology. Trends in cognitive sciences 22, 7 (2018), 598–610.
- Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors. arXiv preprint arXiv:2305.18274 (2023).
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In 5th International Conference on Learning Representations. OpenReview.net.
- Deep image reconstruction from human brain activity. PLoS computational biology 15, 1 (2019), e1006633.
- Yu Takagi and Shinji Nishimoto. 2023. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14453–14463.
- Aligning individual brains with fused unbalanced Gromov Wasserstein. Advances in neural information processing systems 35 (2022), 21792–21804.
- The WU-Minn human connectome project: an overview. Neuroimage 80 (2013), 62–79.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Brain state decoding for rapid image retrieval. In Proceedings of the 17th ACM International Conference on Multimedia. 945–954.
- Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization. In Proceedings of the 31st ACM International Conference on Multimedia. 290–299.
- Image as a foreign language: BEiT pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Decoding pixel-level image features from two-photon calcium signals of macaque visual cortex. Neural Computation 34, 6 (2022), 1369–1397.
- CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic Decoding. (2024).
- Adaptive Mixture of Experts Learning for Generalizable Face Anti-Spoofing. In Proceedings of the 30th ACM International Conference on Multimedia.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.