Naturalistic Music Decoding from EEG Data via Latent Diffusion Models (2405.09062v5)
Abstract: In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
- “Brain2music: Reconstructing music from human brain activity,” arXiv preprint arXiv:2307.11078, 2023.
- “Mulan: A joint embedding of music audio and natural language,” in Ismir 2022 Hybrid Conference, 2022.
- “Musiclm: Generating music from text,” arXiv preprint arXiv:2301.11325, 2023.
- “Music can be reconstructed from human auditory cortex activity using nonlinear decoding models,” PLOS Biology, vol. 21, no. 8, pp. 1–27, 08 2023.
- Apple Inc, “Biosignal sensing device using dynamic selection of electrodes,” 2023, US Patent US20230225659A1.
- Ian Daly, “Neural decoding of music from the eeg,” Scientific Reports, vol. 13, no. 1, pp. 624, 2023.
- “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, nov 1997.
- “Decoding speech perception from non-invasive brain recordings,” Nature Machine Intelligence, vol. 5, no. 10, pp. 1097–1107, 2023.
- “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
- “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- “Video generation models as world simulators,” 2024.
- “Diffwave: A versatile diffusion model for audio synthesis,” in International Conference on Learning Representations, 2020.
- “Wavegrad: Estimating gradients for waveform generation,” in International Conference on Learning Representations, 2021.
- “Full-band general audio synthesis with score-based diffusion,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Audioldm 2: Learning holistic audio generation with self-supervised pretraining,” arXiv preprint arXiv:2308.05734, 2023.
- “Syncfusion: Multimodal onset-synchronized video-to-audio foley synthesis,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 936–940.
- “T-foley: A controllable waveform-domain diffusion model for temporal-event-guided foley sound synthesis,” arXiv preprint arXiv:2401.09294, 2024.
- “Diffusion-based generative speech source separation,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Separate and diffuse: Using a pretrained diffusion model for better source separation,” in The Twelfth International Conference on Learning Representations, 2024.
- “Conditioning and sampling in variational diffusion models for speech super-resolution,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Moûsai: Text-to-music generation with long-context latent diffusion,” arXiv preprint arXiv:2301.11757, 2023.
- “Fast timing-conditioned latent audio diffusion,” arXiv preprint arXiv:2402.04825, 2024.
- “Multi-source diffusion models for simultaneous music generation and separation,” in The Twelfth International Conference on Learning Representations, 2023.
- “Instructme: An instruction guided music edit and remix framework with latent diffusion models,” arXiv preprint arXiv:2308.14360, 2023.
- “Generalized multi-source inference for text conditioned music diffusion models,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 6980–6984.
- “Stemgen: A music generation model that listens,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 1116–1120.
- “Cocola: Coherence-oriented contrastive learning of musical audio representations,” 2024.
- “Dreamdiffusion: Generating high-quality images from brain eeg signals,” arXiv preprint arXiv:2306.16934, 2023.
- “Seeing through the brain: Image reconstruction of visual perception from human brain signals,” 2023.
- “Brain-conditional multimodal synthesis: A survey and taxonomy,” arXiv preprint arXiv:2401.00430, 2023.
- “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.
- “Controllable mind visual diffusion model,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, vol. 38, pp. 6935–6943.
- “Nmed-t: A tempo-focused dataset of cortical and behavioral responses to naturalistic music,” in International Society for Music Information Retrieval Conference, 2017.
- “Score-based generative modeling in latent space,” Advances in neural information processing systems, vol. 34, pp. 11287–11302, 2021.
- “Auto-encoding variational bayes,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun, Eds., 2014.
- “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- “Denoising diffusion implicit models,” in International Conference on Learning Representations, 2020.
- “Clap learning audio concepts from natural language supervision,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “High fidelity neural audio compression,” Transactions on Machine Learning Research, 2023.
- “Fr\\\backslash\’echet audio distance: A reference-free metric for evaluating music enhancement algorithms,” in Proc. Interspeech, 2019, pp. 2350–2354.
- “Cnn architectures for large-scale audio classification,” in 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2017, pp. 131–135.
- “Adapting frechet audio distance for generative music evaluation,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 1331–1335.
- Don M. Tucker, “Spatial sampling of head electrical fields: the geodesic sensor net,” Electroencephalography and Clinical Neurophysiology, vol. 87, no. 3, pp. 154–163, 1993.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- Emilian Postolache (11 papers)
- Natalia Polouliakh (3 papers)
- Hiroaki Kitano (6 papers)
- Akima Connelly (1 paper)
- Taketo Akama (13 papers)
- Emanuele Rodolà (90 papers)
- Luca Cosmo (24 papers)