Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
52 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
28 tokens/sec
GPT-4o
100 tokens/sec
DeepSeek R1 via Azure Premium
98 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
197 tokens/sec
2000 character limit reached

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks (2312.05814v2)

Published 10 Dec 2023 in cs.AI, cs.SD, and eess.AS

Abstract: Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. U. Chaudhary, N. Birbaumer, and A. Ramos-Murguialday, “Brain–computer interfaces for communication and rehabilitation,” Nat. ReviewsNeurology, vol. 12, no. 9, pp. 513–525, 2016.
  2. K.-T. Kim, C. Guan, and S.-W. Lee, “A subject-transfer framework based on single-trial EMG analysis using convolutional neural networks,” IEEE Trans. Neural Syst. Rehabil.Eng., vol. 28, no. 1, pp. 94–103, 2019.
  3. K.-H. Thung et al., “Conversion and time-to-conversion predictions of mild cognitive impairment using low-rank affinity pursuit denoising and matrix completion,” Med. ImageAnal., vol. 45, pp. 68–82, 2018.
  4. S. Kim, Y.-E. Lee, S.-H. Lee, and S.-W. Lee, “Diff-E: Diffusion-based learning for decoding imagined speech EEG,” in Interspeech.
  5. R. Mane et al., “FBCNet: A multi-view convolutional neural network for brain-computer interface,” arXiv preprintarXiv:2104.01233, 2021.
  6. M. Lee, C.-B. Song, G.-H. Shin, and S.-W. Lee, “Possible effect of binaural beat combined with autonomous sensory meridian response for inducing sleep,” Front. Hum.Neurosci., vol. 13, pp. 425–440, 2019.
  7. Y.-E. Lee, N.-S. Kwak, and S.-W. Lee, “A real-time movement artifact removal method for ambulatory brain-computer interfaces,” IEEE Trans. on Neural Systems RehabilitationEngineering, vol. 28, no. 12, pp. 2660–2670, 2020.
  8. Y.-E. Lee, S.-H. Lee, S.-H. Kim, and S.-W. Lee, “Towards voice reconstruction from EEG during imagined speech,” in AAAI Conf. Artif. Intell.(AAAI).
  9. S.-H. Lee, M. Lee, and S.-W. Lee, “Neural decoding of imagined speech and visual imagery as intuitive paradigms for BCI communication,” IEEE Trans. Neural Syst. Rehabil.Eng., vol. 28, no. 12, pp. 2647–2659, 2020.
  10. J.-P. Lachaux, N. Axmacher, F. Mormann, E. Halgren, and N. E. Crone, “High-frequency neural activity and human cognition: Past, present and possible future of intracranial EEG research,” Prog. Neurobiology, vol. 98, no. 3, pp. 279–301, 2012.
  11. R. Krepki, B. Blankertz, G. Curio, and K.-R. Müller, “The Berlin brain-computer interface (BBCI)–towards a new communication channel for online control in gaming applications,” Multimed. Tools Applications, vol. 33, no. 1, pp. 73–90, Feb. 2007.
  12. A. Delorme and S. Makeig, “EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis,” J. NeuroscienceMethods, vol. 134, no. 1, pp. 9–21, 2004.
  13. T. Sainburg, M. Thielk, and T. Q. Gentner, “Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,” PLoS ComputationalBiology, vol. 16, no. 10, p. e1008228, 2020.
  14. S.-B. Lee et al., “Comparative analysis of features extracted from EEG spatial, spectral and temporal domains for binary and multiclass motor imagery classification,” Inf.Sci., vol. 502, pp. 190–200, 2019.
  15. S.-H. Lee, M. Lee, J.-H. Jeong, and S.-W. Lee, “Towards an EEG-based intuitive BCI communication system using imagined speech and visual imagery,” in Conf. Proc. IEEE Int. Conf. Syst. Man Cybern.(SMC), pp. 4409–4414.
  16. G. K. Anumanchipalli, J. Chartier, and E. F. Chang, “Speech synthesis from neural decoding of spoken sentences,” Nature, vol. 568, no. 7753, pp. 493–498, 2019.
  17. J. Kim et al., “Abstract representations of associated emotions in the human brain,” J.Neurosci., vol. 35, no. 14, pp. 5655–5663, 2015.
  18. J.-S. Bang, M.-H. Lee, S. Fazil, C. Guan, and S.-W. Lee, “Spatio–spectral feature representation for motor imagery classification using convolutional neural networks,” IEEE Trans. Neural Netw. Learn.Syst., vol. 33, no. 7, pp. 3038–3049, 2021.
  19. M. Angrick et al., “Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity,” Commun.Biol., vol. 4, no. 1, pp. 1–10, 2021.
  20. H.-I. Suk, S. Fazli, J. Mehnert, K.-R. Müller, and S.-W. Lee, “Predicting BCI subject performance using probabilistic spatio-temporal filters,” PLoSOne, vol. 9, no. 2, p. e87056, 2014.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.