Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning (2107.09998v3)

Published 21 Jul 2021 in eess.AS, cs.AI, and cs.SD

Abstract: Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less attention, despite their wide applications. In previous work, the SampleRNN method was considered for sound generation in the time domain. However, SampleRNN is potentially limited in capturing long-range dependencies within sounds as it only back-propagates through a limited number of samples. In this work, we propose a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes. This offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips. We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds. Experimental results show that our method offers comparable performance in quality and significantly better performance in diversity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xubo Liu (66 papers)
  2. Turab Iqbal (14 papers)
  3. Jinzheng Zhao (18 papers)
  4. Qiushi Huang (23 papers)
  5. Mark D. Plumbley (114 papers)
  6. Wenwu Wang (148 papers)
Citations (44)

Summary

We haven't generated a summary for this paper yet.