Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision (1608.03425v2)

Published 11 Aug 2016 in q-bio.NC and q-bio.QM

Abstract: Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships be-tween the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albe-it to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was synthesized from natural images with high-throughput to map category representation, con-trast, and selectivity. Through the decoding models, fMRI signals were directly decoded to estimate the feature representations in both visual and semantic spaces, for direct visual reconstruction and seman-tic categorization, respectively. These results cor-roborate, generalize, and extend previous findings, and highlight the value of using deep learning, as an all-in-one model of the visual cortex, to understand and decode natural vision.

Citations (250)

View on Semantic Scholar

Summary

The paper demonstrates that CNNs significantly predict fMRI responses in both ventral and dorsal streams during dynamic natural vision.
The paper introduces voxel-wise encoding and decoding models that reconstruct visual stimuli and semantic categories from brain activity.
The paper shows that cross-subject generalization and high-throughput modeling provide a robust framework for deciphering neural representations.

Deep Learning for Neural Encoding and Decoding of Dynamic Natural Vision

The paper, "Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision," explores the application of convolutional neural networks (CNNs) to interpret functional magnetic resonance imaging (fMRI) data pertaining to dynamic natural vision. By leveraging deep learning models, the paper aims to address significant gaps in understanding how the brain processes complex visual information and to provide a framework for neural encoding and decoding that encompasses both visual and semantic spaces.

The paper is anchored on the premise that CNNs, driven primarily by feedforward processing for image recognition, can provide robust predictions and decoding of cortical responses to complex video stimuli. Prior studies have primarily concentrated on static images to elucidate ventral-stream processing, but this paper ventures further to utilize dynamic stimuli. The research findings demonstrate that CNNs can account for a substantial portion of the variance in fMRI signals not only in the ventral stream but also partially in the dorsal stream.

Initial data of 11.5 hours of fMRI from subjects watching diverse video clips were used to train separate encoding and decoding models. The encoding models visualized fMRI responses at cortical voxels, providing insight into the unique visual representations at discrete brain locations. The decoding models translated fMRI signals back into visual stimuli and semantic categories, highlighting the bi-directional predictive utility of CNNs in modeling brain activity associated with natural vision.

Key Findings

CNN Predictive Accuracy: CNNs explained significant variance in fMRI responses across the visual cortex during exposure to dynamic stimuli. The models revealed strong predictive capabilities for ventral-stream areas, with promising, though reduced, accuracy for dorsal-stream areas.
Voxel-wise Encoding and Decoding: Encoding models were highly effective in visualizing single-voxel representations, revealing unique cortical activation patterns in response to visual stimuli. Decoding efforts enabled the reconstruction of visual inputs and categorical labeling in semantic space, thus facilitating interpretative tasks from brain data.
High-Throughput Experimentation: The models were generalized to unseen stimuli, predicting cortical responses to a vast array of natural images, efficiently mapping distinct categorical representations using a high-throughput computational approach.
Cross-Subject Modelling: The paper demonstrated cross-subject consistency, where encoding and decoding models trained on one subject could be applied with success, albeit at reduced accuracy, to other subjects, indicating potential utility in broader population studies.

Implications

The use of CNNs offers a compelling approach to bridge the gap between neural activity data and visual comprehension, providing both predictability and interpretability. The theoretical implications suggest that hierarchical feedforward CNNs mimic certain processes within the human visual cortex, but notably lack in modeling recurrent and feedback processes that could enhance dorsal-stream predictions.

Practically, this method provides a framework for decoding dynamic visual experiences and maps of semantic content from fMRI data, with potential applications in fields such as neural prosthetics and cognitive neuroscience research.

Future Prospects

Future investigations should consider integrating additional neural network architectures that model recurrent connections and feedback loops, potentially improving the fit of these models to the complexity of human vision. The paper paves the way for advancements in AI systems, where the interplay between CNN-based models and human neural architecture can yield better understanding and performance in artificial sensory systems.

In conclusion, while CNNs prove invaluable as predictive models for dynamic natural vision, there remains a substantive frontier for integrating a more comprehensive set of network dynamics to align with the intricate architecture of the human brain.