Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data (2403.11207v2)

Published 17 Mar 2024 in cs.CV, cs.AI, and q-bio.NC

Abstract: Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub.

Overview of ICML 2024 Submission and Formatting Guidelines

Submission Process

The International Conference on Machine Learning (ICML) 2024 emphasizes a purely electronic submission process through a designated website, diverging from email-based submissions. This year introduces a crucial update requiring authors to merge appendices with the main document into a single PDF file, ensuring that appendices are not overlooked during the review as separate entities. A strict page limit is enforced, with the main body of the paper capped at 8 pages, exclusive of references and appendices. Authors of accepted papers have the liberty to extend the main body by one additional page for the final submission.

Formatting Specifications

For the initial submission and the camera-ready copy, adherence to the specified formatting guidelines is mandatory. Papers must be presented in a PDF format utilizing a 10-point Times font. Special attention should be paid to embedding only Type-1 fonts to avoid compatibility issues. The figures and tables should be adequately sized and placed, following the guideline specifics for captions and alignments.

Double-Blind Review and Anonymity

ICML 2024 maintains a double-blind review policy requiring submissions to anonymize author and affiliation information. This extends to ensuring that citations to the authors' previous work are formulated in a manner that does not reveal the identity of the authors. Simultaneous submissions to other venues or any overlapping technical content with previously published work disqualify a submission from consideration.

Final Submission Instructions

Accepted papers must incorporate author names and affiliations while adhering to the provided formatting guidelines. The footnote indicating the preliminary review status must be updated to reflect the paper's acceptance and the copyright transfer to ICML. Adjustments are allowed in the header for a running title if the original exceeds the permissible length.

Theoretical Implications and Practical Applications

The systematic structuring and specific criteria set forth by the submission and formatting instructions for ICML 2024 not only streamline the submission process but also encourage uniformity and high readability across papers. This approach significantly aids in maintaining the integrity of the review process and ensures that papers meeting the conference's standards are judged on their merit and content quality. Looking forward, this rigorous adherence to format may inspire refinements in submission guidelines across related conferences, potentially leading to a standardization that could simplify the submission process for authors and the review process for organizers and reviewers alike.

In conclusion, the submission and formatting instructions for ICML 2024 serve as a detailed roadmap for authors, ensuring that submissions are not only consistent in their presentation but also conducive to a fair and blind review process. As machine learning continues to evolve, the importance of clear, accessible, and uniformly presented research cannot be overstated, underpinning the advancement of the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Learning Transferable Visual Models From Natural Language Supervision, February 2021. URL http://arxiv.org/abs/2103.00020. arXiv:2103.00020 [cs].
  2. High-Resolution Image Synthesis with Latent Diffusion Models, April 2022. URL http://arxiv.org/abs/2112.10752. arXiv:2112.10752 [cs].
  3. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1):116–126, January 2022. ISSN 1097-6256, 1546-1726. doi: 10.1038/s41593-021-00962-x. URL https://www.nature.com/articles/s41593-021-00962-x.
  4. Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. preprint, Neuroscience, November 2022. URL http://biorxiv.org/lookup/doi/10.1101/2022.11.18.517004.
  5. Yu Takagi and Shinji Nishimoto. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs, 2023.
  6. Reconstruction of Perceived Images from fMRI Patterns and Semantic Brain Exploration using Instance-Conditioned GANs, February 2022. URL http://arxiv.org/abs/2202.12692. arXiv:2202.12692 [cs, eess, q-bio].
  7. Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion, March 2023. URL http://arxiv.org/abs/2303.05334. arXiv:2303.05334 [cs, q-bio].
  8. Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity. NeuroImage, 254:119121, July 2022. ISSN 10538119. doi: 10.1016/j.neuroimage.2022.119121. URL https://linkinghub.elsevier.com/retrieve/pii/S105381192200249X.
  9. Decoding natural image stimuli from fMRI data with a surface-based convolutional network, March 2023. URL http://arxiv.org/abs/2212.02409. arXiv:2212.02409 [cs, q-bio].
  10. Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors. Advances in Neural Information Processing Systems, 36:24705–24728, December 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/4ddab70bf41ffe5d423840644d3357f4-Abstract-Conference.html.
  11. Reconstructing seen images from human brain activity via guided stochastic search. Conference on Cognitive Computational Neuroscience, 2023a. doi: 10.32470/CCN.2023.1672-0.
  12. Second Sight: Using brain-optimized encoding models to align image distributions with human brain activity, June 2023b. URL http://arxiv.org/abs/2306.00927. arXiv:2306.00927 [cs, q-bio].
  13. Brain-optimized inference improves reconstructions of fMRI brain activity, December 2023c. URL http://arxiv.org/abs/2312.07705. arXiv:2312.07705 [cs, q-bio].
  14. Through their eyes: multi-subject Brain Decoding with simple alignment techniques, August 2023a. URL http://arxiv.org/abs/2309.00627. arXiv:2309.00627 [cs, q-bio].
  15. Aligning brain functions boosts the decoding of visual semantics in novel subjects, December 2023. URL http://arxiv.org/abs/2312.06467. arXiv:2312.06467 [cs, eess, q-bio].
  16. Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity, May 2023a. URL http://arxiv.org/abs/2305.11675. arXiv:2305.11675 [cs].
  17. Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding, March 2023b. URL http://arxiv.org/abs/2211.06956. arXiv:2211.06956 [cs].
  18. Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities, December 2023. URL http://arxiv.org/abs/2305.17214. arXiv:2305.17214 [cs].
  19. UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity, August 2023. URL http://arxiv.org/abs/2308.07428. arXiv:2308.07428 [cs].
  20. DREAM: Visual Decoding from Reversing Human Visual System, October 2023. URL http://arxiv.org/abs/2310.02265. arXiv:2310.02265 [cs, eess, q-bio].
  21. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI, July 2019. URL http://arxiv.org/abs/1907.02431. arXiv:1907.02431 [cs, eess, q-bio, stat].
  22. Deep image reconstruction from human brain activity. PLOS Computational Biology, 15(1):e1006633, January 2019a. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1006633. URL https://dx.plos.org/10.1371/journal.pcbi.1006633.
  23. End-to-End Deep Image Reconstruction From Human Brain Activity. Frontiers in Computational Neuroscience, 13, 2019b. ISSN 1662-5188. URL https://www.frontiersin.org/articles/10.3389/fncom.2019.00021.
  24. Generative adversarial networks for reconstructing natural images from brain activity. NeuroImage, 181:775–785, November 2018. ISSN 10538119. doi: 10.1016/j.neuroimage.2018.07.043. URL https://linkinghub.elsevier.com/retrieve/pii/S105381191830658X.
  25. DCNN-GAN: Reconstructing Realistic Image from fMRI, January 2019. URL http://arxiv.org/abs/1901.07368. arXiv:1901.07368 [cs, eess].
  26. Hierarchical Text-Conditional Image Generation with CLIP Latents, April 2022. URL http://arxiv.org/abs/2204.06125. arXiv:2204.06125 [cs].
  27. LAION-5B: An open large-scale dataset for training next generation image-text models, October 2022. URL http://arxiv.org/abs/2210.08402. arXiv:2210.08402 [cs].
  28. OpenCLIP, July 2021. URL https://doi.org/10.5281/zenodo.5143773.
  29. Accelerate: Training and inference at scale made simple, efficient and adaptable., 2022. URL https://github.com/huggingface/accelerate.
  30. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, May 2020. URL http://arxiv.org/abs/1910.02054. arXiv:1910.02054 [cs, stat].
  31. J. Talairach and P. Tournoux. Co-planar stereotaxic atlas of the human brain. 3-Dimensional proportional system: an approach to cerebral imaging. The Journal of Laryngology & Otology, 104(1):72–72, January 1990. ISSN 1748-5460, 0022-2151. doi: 10.1017/S0022215100111879. URL https://www.cambridge.org/core/journals/journal-of-laryngology-and-otology/article/abs/co-planar-stereotaxic-atlas-of-the-human-brain-3-dimensional-proportional-system-an-approach-to-cerebral-imaging-1988talairichj-and-tournouxprayportmarkgeorg-thieme-verlag-stuttgart-new-york3-13-711-701-1-price-dm-268-pp-122-illustrations-130/46C98B7A1D9ABB728CB5A5709C09AF89. Publisher: Cambridge University Press.
  32. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philosophical Transactions of the Royal Society of London. Series B, 356(1412):1293–1322, August 2001. ISSN 0962-8436. doi: 10.1098/rstb.2001.0915. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1088516/.
  33. A Common, High-Dimensional Model of the Representational Space in Human Ventral Temporal Cortex. Neuron, 72(2):404–416, October 2011. ISSN 08966273. doi: 10.1016/j.neuron.2011.08.026. URL https://linkinghub.elsevier.com/retrieve/pii/S0896627311007811.
  34. LiT: Zero-Shot Transfer with Locked-image text Tuning, June 2022. URL http://arxiv.org/abs/2111.07991. arXiv:2111.07991 [cs].
  35. Mixco: Mix-up contrastive learning for visual representation. ArXiv, abs/2010.06300, 2020.
  36. Vicregl: Self-supervised learning of local visual features. ArXiv, abs/2210.01571, 2022.
  37. GIT: A Generative Image-to-text Transformer for Vision and Language, December 2022. URL http://arxiv.org/abs/2205.14100. arXiv:2205.14100 [cs].
  38. Brain Captioning: Decoding human brain activity into images and text, May 2023b. URL http://arxiv.org/abs/2305.11560. arXiv:2305.11560 [cs].
  39. Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, March 2023. URL http://arxiv.org/abs/2211.08332. arXiv:2211.08332 [cs].
  40. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, August 2023. URL http://arxiv.org/abs/2308.06721. arXiv:2308.06721 [cs].
  41. Justin Pinkney. Lambda Diffusers, 2022. URL https://github.com/LambdaLabsML/lambda-diffusers. publicationType: misc; publisher: GitHub; journal: GitHub repository.
  42. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations, January 2022. URL http://arxiv.org/abs/2108.01073. arXiv:2108.01073 [cs].
  43. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, July 2023. URL http://arxiv.org/abs/2307.01952. arXiv:2307.01952 [cs].
  44. Common Diffusion Noise Schedules and Sample Steps are Flawed, January 2024. URL http://arxiv.org/abs/2305.08891. arXiv:2305.08891 [cs].
  45. Nicholas Guttenberg. Diffusion with Offset Noise, 2023. URL https://www.crosslabs.org//blog/diffusion-with-offset-noise.
  46. Microsoft COCO: Common Objects in Context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pages 740–755, Cham, 2014. Springer International Publishing. ISBN 978-3-319-10602-1. doi: 10.1007/978-3-319-10602-1_48.
  47. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004. ISSN 1941-0042. doi: 10.1109/TIP.2003.819861. Conference Name: IEEE Transactions on Image Processing.
  48. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. URL http://arxiv.org/abs/1905.11946. arXiv:1905.11946 [cs, stat].
  49. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, January 2021. URL http://arxiv.org/abs/2006.09882. arXiv:2006.09882 [cs].
  50. Language Models are Unsupervised Multitask Learners. 2019.
  51. Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
  52. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss, editors, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. URL https://aclanthology.org/W05-0909.
  53. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, October 2020. URL http://arxiv.org/abs/2004.09813. arXiv:2004.09813 [cs].
  54. Mind Reader: Reconstructing complex images from brain activities, September 2022. URL http://arxiv.org/abs/2210.01769. arXiv:2210.01769 [cs, eess, q-bio].
  55. Encoding and decoding in fMRI. NeuroImage, 56(2), 2011. doi: 10.1016/j.neuroimage.2010.07.073.
  56. Brain-optimized neural networks learn non-hierarchical models of representation in human visual cortex. bioRxiv, 2022. doi: 10.1101/2022.01.21.477293.
  57. A Reduced-Dimension fMRI Shared Response Model. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://papers.nips.cc/paper_files/paper/2015/hash/b3967a0e938dc2a6340e258630febd5a-Abstract.html.
  58. Learning shared neural manifolds from multi-subject FMRI data, December 2021. URL http://arxiv.org/abs/2201.00622. arXiv:2201.00622 [cs, eess, q-bio].
  59. Measuring shared responses across subjects using intersubject correlation. Social Cognitive and Affective Neuroscience, page nsz037, May 2019. ISSN 1749-5016, 1749-5024. doi: 10.1093/scan/nsz037. URL https://academic.oup.com/scan/advance-article/doi/10.1093/scan/nsz037/5489905.
  60. Hybrid hyperalignment: A single high-dimensional model of shared information embedded in cortical patterns of response and functional connectivity. NeuroImage, 233:117975, June 2021. ISSN 10538119. doi: 10.1016/j.neuroimage.2021.117975. URL https://linkinghub.elsevier.com/retrieve/pii/S1053811921002524.
  61. Learnable latent embeddings for joint behavioural and neural analysis. Nature, 617(7960):360–368, May 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06031-6. URL https://www.nature.com/articles/s41586-023-06031-6. Number: 7960 Publisher: Nature Publishing Group.
  62. Decoding speech from non-invasive brain recordings, August 2022. URL http://arxiv.org/abs/2208.12266. arXiv:2208.12266 [cs, eess, q-bio].
  63. Brain decoding: toward real-time reconstruction of visual perception. October 2023. URL https://openreview.net/forum?id=3y1K6buO8c.
  64. Memory Encoding Model, August 2023. URL http://arxiv.org/abs/2308.01175. arXiv:2308.01175 [cs].
  65. A Parameter-efficient Multi-subject Model for Predicting fMRI Activity, August 2023. URL https://arxiv.org/abs/2308.02351v1.
  66. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, pages 1–9, May 2023. ISSN 1546-1726. doi: 10.1038/s41593-023-01304-9. URL https://www.nature.com/articles/s41593-023-01304-9. Publisher: Nature Publishing Group.
  67. Willful Modulation of Brain Activity in Disorders of Consciousness. New England Journal of Medicine, 362(7):579–589, February 2010. ISSN 0028-4793. doi: 10.1056/NEJMoa0905370. URL https://doi.org/10.1056/NEJMoa0905370. Publisher: Massachusetts Medical Society _eprint: https://doi.org/10.1056/NEJMoa0905370.
  68. RT-Cloud: A cloud-based software framework to simplify and standardize real-time fMRI. NeuroImage, 257:119295, August 2022. ISSN 10538119. doi: 10.1016/j.neuroimage.2022.119295. URL https://linkinghub.elsevier.com/retrieve/pii/S1053811922004141.
  69. Sigmoid Loss for Language Image Pre-Training, September 2023. URL http://arxiv.org/abs/2303.15343. arXiv:2303.15343 [cs].
  70. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11:e77599, November 2022. ISSN 2050-084X. doi: 10.7554/eLife.77599. URL https://doi.org/10.7554/eLife.77599. Publisher: eLife Sciences Publications, Ltd.
  71. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, January 2018. URL http://arxiv.org/abs/1706.08500. arXiv:1706.08500 [cs, stat].
  72. Romain Beaumont. Clip Retrieval: Easily compute clip embeddings and build a clip retrieval system with them, 2022. URL https://github.com/rom1504/clip-retrieval. publicationType: misc; publisher: GitHub; journal: GitHub repository.
  73. The Faiss library. 2024. _eprint: 2401.08281.
  74. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.
  75. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, June 2016. doi: 10.1109/CVPR.2016.308. ISSN: 1063-6919.
  76. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020. URL http://arxiv.org/abs/1802.03426. arXiv:1802.03426 [cs, stat].
  77. Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning, October 2022. URL http://arxiv.org/abs/2203.02053. arXiv:2203.02053 [cs].
  78. A massive 7T fMRI dataset to bridge cognitive and computational neuroscience. preprint, Neuroscience, February 2021. URL http://biorxiv.org/lookup/doi/10.1101/2021.02.22.432340.
  79. Brain Dissection: fMRI-trained Networks Reveal Spatial Selectivity in the Processing of Natural Images, May 2023. URL https://www.biorxiv.org/content/10.1101/2023.05.29.542635v1. Pages: 2023.05.29.542635 Section: New Results.
  80. BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity. October 2023a. URL https://openreview.net/forum?id=mQYHXUUTkU&referrer=%5Bthe%20profile%20of%20Leila%20Wehbe%5D(%2Fprofile%3Fid%3D~Leila_Wehbe1).
  81. Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models, November 2023b. URL http://arxiv.org/abs/2306.03089. arXiv:2306.03089 [cs].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Paul S. Scotti (3 papers)
  2. Mihir Tripathy (1 paper)
  3. Cesar Kadir Torrico Villanueva (1 paper)
  4. Reese Kneeland (5 papers)
  5. Tong Chen (200 papers)
  6. Ashutosh Narang (1 paper)
  7. Charan Santhirasegaran (1 paper)
  8. Jonathan Xu (4 papers)
  9. Thomas Naselaris (10 papers)
  10. Kenneth A. Norman (9 papers)
  11. Tanishq Mathew Abraham (6 papers)
Citations (21)
Youtube Logo Streamline Icon: https://streamlinehq.com