BrainExplore: fNIRS-based fMRI Prediction Framework

Updated 5 January 2026

BrainExplore is a methodological pipeline that predicts fMRI activation markers from fNIRS data using machine learning and neural data augmentation.
It employs regression models like Lasso and SVR to map preprocessed fNIRS signals to task-specific cortical fMRI activations, validated on stop-signal and reversal learning tasks.
The framework offers a cost-effective, non-invasive alternative for neurocognitive biomarker estimation, especially useful for populations where fMRI is impractical.

The BrainExplore framework is a methodological pipeline for predicting functional magnetic resonance imaging (fMRI) activation markers of cognition from functional near-infrared spectroscopy (fNIRS) data, leveraging ML models and neural data augmentation. It facilitates the use of fNIRS—a portable, low-cost optical neuroimaging modality—as a surrogate for fMRI biomarkers, addressing challenges arising from fMRI's expense and acquisition difficulties, particularly in populations such as infants. The framework was introduced and validated on two cognitive tasks (stop-signal and probabilistic reversal learning) using concurrent fNIRS and fMRI measurements from 50 human participants (Hur et al., 2022).

1. Formal Problem Statement

Let $n$ be the number of subjects after quality control (%%%%1%%%% for SST, $n=32$ for PRL), $d=48$ the prefrontal fNIRS channel count, and $m$ the number of fMRI activation clusters of interest ( $m=8$ SST, $m=1$ PRL). Define:

$X \in \mathbb{R}^{n \times d}$ : subject-wise fNIRS $\beta$ -values (from GLM) across channels.
$Y \in \mathbb{R}^{n \times m}$ : corresponding subject-wise fMRI $\beta$ -values (from GLM) for activated clusters.

The goal is to learn a mapping $f_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^m$ , parameterized by $\theta$ , that predicts fMRI markers from multivariate fNIRS patterns, minmizing prediction error on held-out subjects:

$f_\theta(X_i) = \hat{Y}_i, \;\; i = 1, \ldots, n$

2. Signal Preprocessing Pipeline

a) fNIRS Data

Acquisition: Raw light-intensity signals from 24 sources × 32 detectors yield 48 overlapping channels.
Conversion: Separately extracted time series for total hemoglobin (HbT), oxyhemoglobin (HbO), and deoxyhemoglobin (HbR).
Filtering: Band-pass (0.01–0.2 Hz) removes drift and heart-beat noise.
Channel-wise z-normalization.
GLM Regression: Task regressors (eg., successful‐stop vs. successful‐go for SST; trial-by-trial prediction error for PRL) convolved with canonical HRF. Result is $\beta^{\mathrm{fNIRS}}_{i,c}$ for subject $i$ , channel $c$ .

b) fMRI Data

Preprocessing: Slice-timing correction, realignment, normalization in SPM.
GLM Regression: Same task regressors yield voxel-wise $\beta$ -values.
Cluster-level inference: $p < 0.05$ FWE-corrected for significant clusters.
Summary: Mean $\beta^{\mathrm{fMRI}}_{i,k}$ across voxels in active cluster $k$ for each subject $i$ .

3. Predictive Modeling Approaches

Four supervised regression models are implemented:

Model	Objective Function / Algorithm	Regularization
OLS	$\theta = \arg\min_\theta \frac{1}{n}\sum_i \\|Y_i - X_i\theta\\|_2^2$ (closed-form)	None
Ridge Regression	$\theta = \arg\min_\theta \frac{1}{n}\sum_i \\|Y_i - X_i\theta\\|_2^2 + \lambda\\|\theta\\|_2^2$	$\ell_2$ penalty
Lasso Regression	$\theta = \arg\min_\theta \frac{1}{n}\sum_i \\|Y_i - X_i\theta\\|_2^2 + \lambda\\|\theta\\|_1$	$\ell_1$ penalty
SVR (RBF kernel)	Quadratic program with $\epsilon$ -insensitive loss, RBF feature mapping	Margin and RBF

Hyperparameters ( $\lambda$ , $C$ , $\gamma$ ) are selected via nested leave-one-out cross-validation. Optimization procedures are closed-form for OLS/ridge, coordinate descent for Lasso, and SMO for SVR.

4. Neural Data Augmentation Strategy

For each subject, the initial fNIRS time series $X^{\mathrm{raw}}_i \in \mathbb{R}^{T_i \times 48}$ is channelwise normalized. The framework generates $S=100$ synthetic replicates via:

$\varepsilon^{(s)} \sim \mathcal{N}(0, \sigma^2 I_{T_i \times 48}), \; \sigma=0.01$

$X^{\mathrm{aug}, (s)}_i = X^{\mathrm{raw}}_i + \varepsilon^{(s)}$

Each augmented time-series is GLM-processed to yield $\beta^{\mathrm{aug}, (s)}_i \in \mathbb{R}^{48}$ . Thus, the effective training set size per subject is increased to 100, enhancing model generalization under LOSO cross-validation.

5. Training, Validation, and Evaluation Metrics

Hyperparameter grids: Ridge/Lasso $\lambda \in [10^{-4}, 10^{2}]$ ; SVR $C \in \{0.1, 1, 10\}$ , $\gamma \in \{10^{-3}, 10^{-2}, 10^{-1}\}$ .
Cross-validation: Leave-one-subject-out; train on augmented data for $n-1$ subjects, predict $\hat{Y}_i$ on held-out subject.
Metrics:
- Mean Squared Error (MSE):
$\mathrm{MSE} = \frac{1}{m} \sum_{k=1}^m (Y_{i,k} - \hat{Y}_{i,k})^2$ - Pearson correlation ( $r$ ):

$r = \frac{\sum_{k}(Y_{i,k} - \bar{Y})(\hat{Y}_{i,k} - \overline{\hat{Y}})}{\sqrt{\sum_{k}(Y_{i,k} - \bar{Y})^2}\sqrt{\sum_{k}(\hat{Y}_{i,k} - \overline{\hat{Y}})^2}}$ - Coefficient of determination ( $R^2$ ):

$R^2 = 1 - \frac{\sum_{k}(Y_{i,k} - \hat{Y}_{i,k})^2}{\sum_{k}(Y_{i,k} - \bar{Y})^2}$

Key empirical findings:

SST task: Lasso regression on HbR predicted fMRI $\beta$ in right IFG ( $\mathrm{MSE}=4.787$ , $r=0.52$ , $p<0.01$ ), SMA ( $\mathrm{MSE}=7.194$ , $r=0.48$ , $p<0.01$ ), left IFG ( $\mathrm{MSE}=8.158$ , $r=0.50$ , $p<0.01$ ).
PRL task: SVR(RBF) on HbT predicted IPL ( $\mathrm{MSE}=0.115$ , $r=0.45$ , $p<0.05$ ).

6. Main Results, Functional Relevance, and Limitations

SST (response inhibition): fNIRS HbR signals plus Lasso regression best predict fMRI activation in bilateral IFG and SMA (all $p<0.01$ ).
PRL (prediction error): fNIRS HbT signals plus SVR (RBF) predict fMRI activation in IPL ( $r=0.45$ , $p<0.05$ ). Subcortical striatal signals could not be recovered, implying fNIRS's limitation to cortical sources.
No attempt was made to infer or predict task-based functional connectivity from fNIRS.

Identified limitations include absence of subcortical coverage, possible confounding due to visit/environmental differences and subject emotional state. Extensions proposed are incorporation of deep neural architectures and domain adaptation for improved subcortical prediction, deployment in infant/patient populations with fMRI contraindications, and augmentation with dynamic functional connectivity features.

7. Implications and Prospective Extensions

The BrainExplore framework demonstrates that standard machine learning regression models, when augmented with neural data synthesis, can non-invasively estimate cortical fMRI markers from fNIRS measurements with significant accuracy. These surrogate markers may facilitate study of populations where fMRI is impractical (infants, specific patients), broaden access to neurocognitive biomarker research, and lay groundwork for the transfer of validated markers across modalities. Suggested future directions include:

Adopting nonlinear or deep learning architectures to capture residual or subcortical activations.
Extending the pipeline to predictive modeling of functional connectivity.
Validating in broader populations and integrating additional neuroimaging features for enhanced generalizability.

A plausible implication is that data-augmented ML protocols using fNIRS can substitute for key aspects of fMRI-based cognitive phenotyping in settings where access or feasibility constraints are paramount (Hur et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Mapping fNIRS to fMRI with Neural Data Augmentation and Machine Learning Models (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to BrainExplore Framework.