Sound reconstruction from human brain activity via a generative model with brain-like auditory features (2306.11629v1)

Published 20 Jun 2023 in cs.SD, cs.HC, and eess.AS

Abstract: The successful reconstruction of perceptual experiences from human brain activity has provided insights into the neural representations of sensory experiences. However, reconstructing arbitrary sounds has been avoided due to the complexity of temporal sequences in sounds and the limited resolution of neuroimaging modalities. To overcome these challenges, leveraging the hierarchical nature of brain auditory processing could provide a path toward reconstructing arbitrary sounds. Previous studies have indicated a hierarchical homology between the human auditory system and deep neural network (DNN) models. Furthermore, advancements in audio-generative models enable to transform compressed representations back into high-resolution sounds. In this study, we introduce a novel sound reconstruction method that combines brain decoding of auditory features with an audio-generative model. Using fMRI responses to natural sounds, we found that the hierarchical sound features of a DNN model could be better decoded than spectrotemporal features. We then reconstructed the sound using an audio transformer that disentangled compressed temporal information in the decoded DNN features. Our method shows unconstrained sounds reconstruction capturing sound perceptual contents and quality and generalizability by reconstructing sound categories not included in the training dataset. Reconstructions from different auditory regions remain similar to actual sounds, highlighting the distributed nature of auditory representations. To see whether the reconstructions mirrored actual subjective perceptual experiences, we performed an experiment involving selective auditory attention to one of overlapping sounds. The results tended to resemble the attended sound than the unattended. These findings demonstrate that our proposed model provides a means to externalize experienced auditory contents from human brain activity.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a novel framework that decodes fMRI data into hierarchical auditory features using a DNN-based generative model.
It employs a VGGish-inspired model and autoregressive audio transformer to translate decoded features into reconstructed sound waveforms.
Strong numerical results, with accuracies above 85%, highlight its potential for advancing brain-machine interfaces and auditory diagnostics.

Sound Reconstruction from Human Brain Activity via a Generative Model with Brain-Like Auditory Features

The paper presents a methodological advancement in reconstructing arbitrary sound experiences from human brain activity, leveraging a generative model integrated with brain-like auditory features. The paper undertakes a challenging domain within neural decoding, addressing the complexity of temporal sequences in auditory perception and the constraints of neuroimaging modalities.

Methodology Overview

The authors introduce a novel framework that combines brain decoding of auditory features with an audio-generative model. They utilize fMRI responses to decode hierarchical sound features from a deep neural network (DNN), specifically the VGGish-ish model. This model effectively mirrors the hierarchical processes of the human auditory system. The paper trains an audio transformer in autoregressive fashion to translate these DNN features into codebook representations, eventually reconstructing audio waveforms.

Strong Numerical Results and Evaluation

The decoding analysis reveals that DNN features outperform traditional spectrotemporal features in reconstructing auditory experiences. Notably, the reconstructed sounds achieve identification accuracies above 85% when evaluated using higher-layer hierarchical representations. Moreover, the model demonstrates robust generalization, successfully reconstructing sound categories absent from the training dataset.

Implications and Future Directions

Practically, this research provides a pathway to externalize subjective auditory contents from brain activity, opening avenues for advanced brain-machine interfaces, auditory hallucination diagnostics, and further explorations in cognitive neuroscience. Theoretically, it reinforces the potential of hierarchical DNNs in capturing and decoding complex sensory experiences.

Looking ahead, enhancing feature decoding with advanced sequential processing models like LSTMs or transformers could improve the granularity of reconstructed sound content, particularly in speech and music sequences. Exploring the application of this method under different neural decoding conditions, such as EEG, may also prove fruitful given their superior temporal resolution.

Conclusion

The paper successfully demonstrates a pioneering approach to reconstructing unconstrained auditory experiences using neural activity. While challenges remain in fully capturing detailed auditory sequences, this research marks a significant step towards comprehensive auditory perception modeling. Future developments could focus on leveraging the strengths of transformers in both feature decoding and temporal information disentanglement, potentially improving the fidelity of reconstructed auditory experiences.

This framework sets a foundation for bridging cognitive neuroscience and computational modeling, contributing significantly to our understanding of auditory representations in the human brain.

PDF Markdown

Related Papers

YouTube

Show All Videos