Scaling up ridge regression for brain encoding in a massive individual fMRI dataset (2403.19421v1)
Abstract: Brain encoding with neuroimaging data is an established analysis aimed at predicting human brain activity directly from complex stimuli features such as movie frames. Typically, these features are the latent space representation from an artificial neural network, and the stimuli are image, audio, or text inputs. Ridge regression is a popular prediction model for brain encoding due to its good out-of-sample generalization performance. However, training a ridge regression model can be highly time-consuming when dealing with large-scale deep functional magnetic resonance imaging (fMRI) datasets that include many space-time samples of brain activity. This paper evaluates different parallelization techniques to reduce the training time of brain encoding with ridge regression on the CNeuroMod Friends dataset, one of the largest deep fMRI resource currently available. With multi-threading, our results show that the Intel Math Kernel Library (MKL) significantly outperforms the OpenBLAS library, being 1.9 times faster using 32 threads on a single machine. We then evaluated the Dask multi-CPU implementation of ridge regression readily available in scikit-learn (MultiOutput), and we proposed a new "batch" version of Dask parallelization, motivated by a time complexity analysis. In line with our theoretical analysis, MultiOutput parallelization was found to be impractical, i.e., slower than multi-threading on a single machine. In contrast, the Batch-MultiOutput regression scaled well across compute nodes and threads, providing speed-ups of up to 33 times with 8 compute nodes and 32 threads compared to a single-threaded scikit-learn execution. Batch parallelization using Dask thus emerges as a scalable approach for brain encoding with ridge regression on high-performance computing systems using scikit-learn and large fMRI datasets.
- Encoding and decoding in fmri. neuroimage, Technometrics 56 (2011) 400–410.
- A. Hoerl, R. Kennard, Ridge regression: applications to nonorthogonal problems, Technometrics 12 (1970) 69–82.
- Regularized brain reading with shrinkage and smoothing., The annals of applied statistics (2015) 1997.
- Neural language models are not born equal to fit brain data, but training helps, arXiv preprint arXiv:2207.03380 (2022).
- Large-scale benchmarking of diverse artificial vision models in prediction of 7t human neuroimaging data, bioRxiv (2022).
- Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain., bioRxiv (2022).
- Visio-linguistic brain encoding., preprint arXiv:2204.08261 (2022).
- Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model., bioRxiv (2022).
- M. Lescroart, J. Gallant, Human scene-selective areas represent 3d configurations of surfaces, Neuron 101 (2019) 178–192.
- A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy., Neuron 98 (2018) 630–644.
- S. Jain, A. Huth, Incorporating context into language encoding models for fmri, Advances in neural information processing systems (2018) 31.
- Neural encoding and decoding with deep learning for dynamic natural vision, Cerebral Cortex 28(12)) (2018) 4136–4160.
- Feature-space selection with banded ridge regression, NeuroImage 264 (2022) 119728.
- End-to-end neural system identification with neural information flow., PLOS Computational Biology 17(2) (2021).
- Identifying natural images from human brain activity, Nature 452 (2008) 352–355.
- Reconstructing visual experiences from brain activity evoked by natural movies., Current Biology 21 (2011) 1641–1646.
- Characterization of deep neural network features by decodability from human brain activity., Scientific data (2019) 190012.
- From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri., In Advances in Neural Information Processing Systems (2019) 6517–6527.
- Category decoding of visual stimuli from human brain activity using a bidirectional recurrent neural network to simulate bidirectional information flows in human visual cortices, Frontiers in neuroscience (2019).
- End-to-end deep image reconstruction from human brain activity., Frontiers in computational neuroscience 13 (2019) 21.
- Extensive sampling for complete models of individual brains. current opinion in behavioral sciences, bioRxiv 40 (2021) 45–51.
- Bold5000: a public fmri dataset while viewing 5000 visual images., Scientific data (2019) 1–18.
- A massive 7t fmri dataset to bridge cognitive and computational neuroscience., bioRxiv (2021).
- Functional brain networks are dominated by stable group and individual factors, not cognitive or daily variation., Neuron (2018) 439–452.
- Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830.
- Model-driven level 3 blas performance optimization on loongson 3a processor., IEEE 18th international conference on parallel and distributed systems (2012) 1–18.
- Intel math kernel library. in high-performance, Computing on the Intel® Xeon Phi™ (2014) 167–188.
- M. Rocklin, Dask: Parallel computation with blocked algorithms and task scheduling, In Proceedings of the 14th python in science conferenc 130 (2015) 136.
- The courtois project on neuronal modeling - 2021 data release, Poster 2224 was presented at the 2021 Annual Meeting of the Organization for Human Brain Mapping held virtually (2021).
- Improving diffusion mri using simultaneous multi-slice echo planar imaging., Neuroimage 63 (2012) 569–580.
- Improving diffusion mri using simultaneous multi-slice echo planar imaging., NeuroimageT 83 (2013) 991–1001.
- D. Van Essen, M. Glasser, The human connectome project: Progress and prospects. in cerebrum: the dana forum on brain science, Dana Foundation 63 (2016).
- fmriprep: a robust preprocessing pipeline for functional mri. nature methods, Neuroimage 16 (2019) 111–116.
- Machine learning for neuroimaging with scikit-learn, Frontiers in neuroinformatics (2014) 14.
- Mist: A multi-resolution parcellation of functional brain networks, MNI Open Research 1 (2019) 3.
- What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines?, BioRxiv (2022).
- K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv (2014).
- Imagenet: A large-scale hierarchical image database., IEEE conference on computer vision and pattern recognition (2009) 248–255.
- Neurophysiological investigation of the basis of the fmri signal, nature (2001) 150–157.
- Ray: A distributed framework for emerging {{\{{AI}}\}} applications, in: 13th USENIX symposium on operating systems design and implementation (OSDI 18), pp. 561–577.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.