Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Clustering and Conventional Networks for Music Separation: Stronger Together (1611.06265v2)

Published 18 Nov 2016 in stat.ML, cs.LG, and cs.SD

Abstract: Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yi Luo (153 papers)
  2. Zhuo Chen (319 papers)
  3. John R. Hershey (40 papers)
  4. Jonathan Le Roux (82 papers)
  5. Nima Mesgarani (45 papers)
Citations (157)

Summary

We haven't generated a summary for this paper yet.