Sharing deep generative representation for perceived image reconstruction from human brain activity (1704.07575v3)

Published 25 Apr 2017 in cs.AI, cs.CV, and q-bio.NC

Abstract: Decoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement noise and the high dimensionality of limited data instances. Existing methods generally suffer from one or both of these issues and yield dissatisfactory results. In this paper, we tackle this problem by casting the reconstruction of visual stimulus as the Bayesian inference of missing view in a multiview latent variable model. Sharing a common latent representation, our joint generative model of external stimulus and brain response is not only "deep" in extracting nonlinear features from visual images, but also powerful in capturing correlations among voxel activities of fMRI recordings. The nonlinearity and deep structure endow our model with strong representation ability, while the correlations of voxel activities are critical for suppressing noise and improving prediction. We devise an efficient variational Bayesian method to infer the latent variables and the model parameters. To further improve the reconstruction accuracy, the latent representations of testing instances are enforced to be close to that of their neighbours from the training set via posterior regularization. Experiments on three fMRI recording datasets demonstrate that our approach can more accurately reconstruct visual stimuli.

PDF Abstract

Sharing Deep Generative Representation for Perceived Image Reconstruction from Human Brain Activity

This paper addresses the challenge of accurately reconstructing perceived visual stimuli from human brain activity, specifically functional magnetic resonance imaging (fMRI) data. The authors propose a novel deep generative multiview model (DGMM) that leverages shared latent generative representations. This approach aligns the external visual stimuli and corresponding brain responses, tackling two critical issues: fMRI measurement noise and the high dimensionality of limited data instances.

Methodology Overview

The DGMM model is conceptualized around a multiview latent variable framework. It seeks to infer the missing view based on Bayesian principles. The model employs deep neural networks (DNN) to parameterize nonlinear observational models of visual images, departing from traditional linear approaches like Bayesian canonical correlation analysis (BCCA). In this regard, DGMM is capable of capturing intricate nonlinear features inherent in visual stimuli. Furthermore, it modelizes fMRI voxel activities using a Gaussian distribution with a full covariance matrix, diverging from spherical assumptions typical of prior methods. This full covariance matrix helps harness voxel correlations, crucial for combating fMRI noise and enhancing prediction accuracy.

To optimize the function of latent variables and model parameters, the authors devised an efficient variational Bayesian inference algorithm, incorporating a posterior regularization technique to bolster reconstruction accuracy further. The proximity of latent representations from test instances to their neighbors in the training dataset is enforced, helping the model circumvent overfitting and improve empirical performance on unseen data.

Experimental Validation

Testing comprised three datasets of fMRI recordings matched with visual stimuli of varying complexity, including contrast-defined patches and handwritten digits or characters. The results demonstrate DGMM’s superiority in reconstructing visual stimuli compared to other methodologies like Miyawaki et al.'s Fixed Bases, BCCA, Deep Canonically Correlated Autoencoders (DCCAE), and Deconvolutional Neural Network (De-CNN) approaches. Quantitative metrics such as Pearson's correlation coefficient, mean squared error, and structural similarity index underline the advantageous performance of DGMM across all experiments.

Implications and Future Directions

The theoretical implications of this work extend beyond image reconstruction. The model’s architecture suggests potential applications in brain encoding tasks, offering a dual-service mechanism where external stimuli could be decoded from brain responses. In practical terms, this advancement edges closer to refined brain-machine interfaces that might leverage real-time brain image interpretations.

Future research could explore integrating recurrent neural networks (RNNs) within this framework, facilitating dynamic vision reconstruction—a venture into temporal coherence of visual stimuli. Furthermore, extending multiview learning to multi-subject fMRI data may enable robust, generalized decoding across different individuals, broadening the applicability and accuracy of brain decoding methodologies.

In conclusion, DGMM represents a significant step in decoding human brain activity, producing more accurate visual reconstructions by effectively addressing noise and dimensional complexity. This work paves a pathway for future exploration into deeper, more unified models of brain computation and perception.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Changde Du (25 papers)
Changying Du (6 papers)
Huiguang He (26 papers)

Citations (53)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos