Generic decoding of seen and imagined objects using hierarchical visual features (1510.06479v3)

Published 22 Oct 2015 in q-bio.NC and cs.CV

Abstract: Object recognition is a key function in both human and machine vision. While recent studies have achieved fMRI decoding of seen and imagined contents, the prediction is limited to training examples. We present a decoding approach for arbitrary objects, using the machine vision principle that an object category is represented by a set of features rendered invariant through hierarchical processing. We show that visual features including those from a convolutional neural network can be predicted from fMRI patterns and that greater accuracy is achieved for low/high-level features with lower/higher-level visual areas, respectively. Predicted features are used to identify seen/imagined object categories (extending beyond decoder training) from a set of computed features for numerous object images. Furthermore, the decoding of imagined objects reveals progressive recruitment of higher to lower visual representations. Our results demonstrate a homology between human and machine vision and its utility for brain-based information retrieval.

Citations (419)

View on Semantic Scholar

Summary

The paper demonstrates that fMRI signals can be used to decode both seen and imagined object categories by leveraging hierarchical visual features.
It employs a modular decoding approach using 13 candidate visual features from models like CNN, HMAX, GIST, and SIFT+BoF to capture feature complexity across multiple brain regions.
Findings indicate that mid-level visual features are most effective for object identification, highlighting a strong alignment between cortical processing and machine vision.

Decoding of Seen and Imagined Objects Using Hierarchical Visual Features

In the paper by Tomoyasu Horikawa and Yukiyasu Kamitani, the authors investigate the potential for decoding both seen and imagined objects from fMRI brain signals by leveraging hierarchical visual features. This research offers novel insights into understanding the interactions between human neural representations and machine vision models such as convolutional neural networks (CNNs). The paper focuses on creating a method for categorically decoding arbitrary objects from brain activity, moving beyond image-based similarity to object-based information.

Methodological Approach

The authors extend the modular approach for decoding visual information, originally used for image reconstruction, to decode object categories. They hypothesize that generic object categories can be represented by a set of invariant visual features, akin to those utilized in object recognition within machine vision. To achieve this, the paper tests 13 candidates of visual feature types/layers sourced from four different computational models: CNN, HMAX, GIST, and SIFT+BoF. It is noteworthy that some models attempt to emulate the hierarchical structure of the human visual system, while others focus on scene and object recognition.

Experimentation and Decoding Process

The research includes fMRI experiments where subjects viewed or imagined objects from selected categories. The CNN, structured similarly to biological vision systems, played a crucial role due to its hierarchical layer design, capturing increasing levels of feature complexity. Decoders were trained to predict visual features derived from these computational models using fMRI activity patterns from multiple brain regions, including V1, V2, V3, V4, and higher-level visual areas like LOC, FFA, and PPA.

Subsequently, the trained decoders predicted feature vectors for seen and imagined objects, allowing cross-referencing with a database of category-average feature vectors from ImageNet. This methodology facilitated identification beyond the training dataset and demonstrated the capacity to decode arbitrary object categories, suggesting a tight association between neural representations and visual complexity levels.

Key Findings and Implications

The results revealed that hierarchical models like CNNs could predict visual feature values from varied brain areas, showing an alignment between cortical regions and visual feature complexities. The decoders not only succeeded in identifying seen objects but also decoded imagined objects, pointing to the progressive recruitment of visual representations from higher to lower levels during mental imagery. Of particular interest is their finding that mid-level features are most effective for object category identification, which could inform future AI development strategies focusing on intermediate feature complexity.

These results have significant implications for understanding brain-based information retrieval processes and establishing a homology between human and machine vision. Moreover, they propose that mental imagery involves top-down visual processing, recruiting feature-level representations akin to those employed during visual perception.

Future Directions

The paper's findings pave the way for further exploration into brain decoding methodologies capable of interfacing with advanced deep learning models for richer outputs. This paper's approach could potentially lead to breakthroughs in brain-machine interfaces by leveraging deep neural networks for direct feature prediction from neural data. Additionally, understanding the neural representation differences between volitional and spontaneous mental imagery remains an intriguing challenge for future research endeavors.

In conclusion, Horikawa and Kamitani's paper extends the frontier of fMRI decoding, proving that complex visual feature representations hold promise for interpreting conscious and imaginative brain states, and setting the stage for continued advancements in the intersection of neuroscience and AI.

PDF Markdown