NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties (2402.01590v2)

Published 2 Feb 2024 in cs.CV

Abstract: In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor. While recent advancements have achieved success in reconstructing static images from non-invasive brain recordings, the domain of translating continuous brain activities into video format remains underexplored. In this work, we introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data, such as noises, spatial redundancy and temporal lags. This framework proposes spatial masking and temporal interpolation-based augmentation for contrastive learning fMRI representations and a diffusion model enhanced by dependent prior noise for video generation. Tested on a publicly available fMRI dataset, our method shows promising results, outperforming the previous state-of-the-art models by a notable margin of ${20.97\%}$, ${31.00\%}$ and ${12.30\%}$ respectively on decoding the brain activities of three subjects in the fMRI dataset, as measured by SSIM. Additionally, our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a two-phase approach that overcomes fMRI’s spatial and temporal challenges to decode video sequences.
It employs spatial masking, temporal augmentation, and a diffusion model with dependent prior noise to boost decoding performance.
Quantitative results, including notable SSIM gains and attention analysis, validate its robustness and biological plausibility.

Introduction

Understanding how the brain processes dynamic visual experiences and translating these processes into video format is an ambitious yet largely underexplored domain in neuroscience and AI research. The paper under review, titled "NeuroCine: Decoding Vivid Video Sequences from Human Brain Activities," presents NeuroCine, a dual-phase framework aiming to bridge this gap by decoding videos from brain activities captured via fMRI data. This framework addresses significant challenges such as noise, spatial redundancy, and temporal lags, which are inherent to fMRI data.

Decoding Framework and Innovations

The proposed framework, NeuroCine, consists of a two-phase approach to handle the spatial and temporal challenges posed by fMRI data. The first phase introduces spatial masking and temporal interpolation-based augmentation, targeting the fMRI data's inherent characteristics for contrastive learning. This enhancement enables the trained fMRI encoder to refine its representations to be robust against spatial and temporal disturbances.

Building on this robust representation, the second phase utilizes a diffusion model exquisite for video generation. What sets the foundation for the diffusion model's success in this context is the incorporation of dependent prior noise, which compensates for fMRI's low signal-to-noise ratio. By utilizing a publicly available fMRI dataset, NeuroCine improved decoding performance by substantial margins across three test subjects, specifically achieving improvements of 20.97%, 31.00%, and 12.30% in comparison with the previous state-of-the-art models.

Results and Validation

The framework's efficacy is further validated through strong numerical results. For instance, improvements in the structural similarity index measure (SSIM) provided quantitative evidence for the enhanced video reconstructions. Alongside these numerical strengths, attention analysis conducted by the authors suggests that the model aligns with known brain structures and functions, bolstering the biological plausibility and interpretability of the decoding process.

Concluding Thoughts

NeuroCine's approach not only pushes the envelope in neural decoding but also forms a cornerstone for furthering our comprehension of the human brain's dynamic vision processing mechanisms. The synergy between advanced neural imaging techniques and machine learning showcased in this paper could have profound implications, from aiding individuals with disabilities to potentially reshaping neuroscientific methodologies and AI generative models. The success of NeuroCine underscores the potential of integrating cognitive science with robust AI to decode and interpret complex neural data, marking a milestone in the interdisciplinary research landscape.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gm8xx8/status/1754326674181677095

Reddit

First direct neurography (2 points, 19 comments)