Papers
Topics
Authors
Recent
Search
2000 character limit reached

MEDIC: Zero-Shot Music Editing & Domain Generalization

Updated 13 January 2026
  • The paper introduces a dual meta-learning approach that constructs balanced decision boundaries for both known and unseen classes.
  • It utilizes domain and class splits to align gradients, mitigating misclassification in open-set conditions and ensuring robust rejection of unknown samples.
  • Empirical results on benchmarks like PACS and Digits-DG demonstrate enhanced open set metrics and improved OSCR performance.

Zero-Shot Music Editing with Disentangled Inversion Control (MEDIC) refers to a framework for open set domain generalization that operates under domain and class mismatches between training and deployment, primarily focused on constructing generalizable decision boundaries robust to previously unseen classes and domains. The method, originally detailed as "Dualistic Meta-Learning for Open Set Domain Generalization" (Wang et al., 2023), is not specific to music editing; rather, MEDIC’s core mechanisms have been applied to domain generalization in vision tasks, offering foundational strategies for zero-shot recognition and editing in any domain where open set and domain shifts occur.

1. Problem Formulation: Open Set Domain Generalization

Open Set Domain Generalization (OSDG) addresses the challenge of training models on source domains S={D1,,DS}\mathcal{S} = \{\mathcal{D}_1, \ldots, \mathcal{D}_S\}, each sharing a known label set C\mathcal{C}, to generalize to target domains containing both known and previously unseen classes (CU, UC=\mathcal{C} \cup \mathcal{U},\ \mathcal{U}\cap\mathcal{C} = \emptyset), with domain distribution shift. Close set DG assumes U=U = \emptyset, but OSDG permits UU \neq \emptyset at test time. In practice, standard DG methods—domain-invariant feature extraction, meta-learning, or augmentation—are insufficient, as they misclassify unseen classes and form overly strict decision boundaries that leave no margin for rejection. Formally, the objective is to learn parameters Θ\Theta such that, for xx from an unseen domain and yCUy \in \mathcal{C} \cup \mathcal{U}, the model correctly predicts if yCy \in \mathcal{C}, or rejects (flags as unknown) if yUy \in \mathcal{U}, given only labeled training data with yCy \in \mathcal{C}.

2. MEDIC Framework: Dualistic Meta-Learning and Open-Set Loss

MEDIC (Dualistic MEta-learning with joint DomaIn-Class matching) advances OSDG by integrating two novel components:

  1. A dualistic meta-learning scheme performing gradient alignment simultaneously across domains and classes.
  2. An open-set loss merging close-set (softmax) and one-vs-all (OVA, multi-binary) heads to form balanced decision boundaries.

Dualistic meta-learning iterates as follows: at each update, source domains are split into two disjoint sets SFS_F, SGS_G (domain-wise split), and known classes CC into disjoint halves C1C_1, C2C_2 (class-wise split). Four mini-tasks are constructed:

  • F1F_1: (SF,C1)(S_F, C_1)
  • F2F_2: (SF,C2)(S_F, C_2)
  • G1G_1: (SG,C1)(S_G, C_1)
  • G2G_2: (SG,C2)(S_G, C_2)

Meta-train uses {F1,G2}\{F_1, G_2\} and meta-test uses {F2,G1}\{F_2, G_1\} in each update, thus entangling inter-domain and inter-class gradient matching.

Meta-objective (abbreviated notation):

F1(Θ), F2(Θ), G1(Θ), G2(Θ)\mathcal{F}_1(\Theta),\ \mathcal{F}_2(\Theta),\ \mathcal{G}_1(\Theta),\ \mathcal{G}_2(\Theta)

  • Inner update: Θ^=Θα[ΘF1(Θ)+ΘG2(Θ)]\hat{\Theta} = \Theta - \alpha [\nabla_{\Theta} \mathcal{F}_1(\Theta) + \nabla_{\Theta} \mathcal{G}_2(\Theta)]
  • Outer update:

ΘΘη{Θ[F1(Θ)+G2(Θ)]+βΘ^[F2(Θ^)+G1(Θ^)]}\Theta \leftarrow \Theta - \eta \left\{ \nabla_\Theta[\mathcal{F}_1(\Theta)+\mathcal{G}_2(\Theta)] + \beta\cdot\nabla_{\hat{\Theta}}[\mathcal{F}_2(\hat{\Theta})+\mathcal{G}_1(\hat{\Theta})] \right\}

Taylor expansion yields a final objective regularizing the alignment of meta-train and meta-test gradients over both domains and classes:

argminΘ{F1(Θ)+G2(Θ)+β[F2(Θ)+G1(Θ)]βα[(F1+G2)(F2+G1)]}\arg\min_{\Theta} \left\{ \mathcal{F}_1(\Theta) + \mathcal{G}_2(\Theta) + \beta[\mathcal{F}_2(\Theta)+\mathcal{G}_1(\Theta)] - \beta \alpha [(\nabla\mathcal{F}_1+\nabla\mathcal{G}_2)\cdot (\nabla\mathcal{F}_2+\nabla\mathcal{G}_1)] \right\}

Open-set loss consists of:

  • LceL_{ce}: standard cross-entropy on softmax head (close-set)
  • Lova(x,y)=logp(y^yx)minjylog[1p(y^jx)]L_{ova}(x,y) = -\log p(\hat{y}^y|x) - \min_{j\ne y} \log[1 - p(\hat{y}^j|x)]: OVA head penalizing the hardest negative

The total loss for a mini-batch BB is Lall(B;Θ)=Lce(B;Θ)+Lova(B;Θ)L_{all}(B;\Theta) = L_{ce}(B;\Theta) + L_{ova}(B;\Theta), with both heads operating over the same extracted features fΘf_\Theta.

3. Architectural and Training Protocols

MEDIC utilizes a shared feature backbone fΘf_\Theta (e.g., ResNet18/50) and two parallel classification heads (softmax, OVA). The training pseudocode is:

1
2
3
4
5
6
7
8
9
for iteration in range(max_iter):
    S_F, S_G = random_split(domains)
    C1, C2 = random_split(classes)
    B_F1 = sample(S_F, C1);  B_F2 = sample(S_F, C2)
    B_G1 = sample(S_G, C1);  B_G2 = sample(S_G, C2)
    L_mt = L_all(B_F1, Θ) + L_all(B_G2, Θ)
    Θ_hat = Θ - α * grad(L_mt)
    L_me = L_all(B_F2, Θ_hat) + L_all(B_G1, Θ_hat)
    Θ = Θ - η * (grad(L_mt) + β * grad(L_me, Θ_hat))

Typical hyperparameters: meta-inner learning rate α[1×104,1×102]\alpha \in [1 \times 10^{-4}, 1 \times 10^{-2}], meta-outer η\eta likewise; meta-weight β[0.1,1.0]\beta \in [0.1, 1.0]. Empirically best values are near 1×1031 \times 10^{-3} and β\beta in [0.2,0.5][0.2, 0.5]. Each split batch size 32.

4. Decision Boundary Characterization

The OVA structure yields C|C| parallel binary boundaries ("inlier" versus "outlier" for each class). Dualistic gradient matching prevents the classic OVA collapse—where boundaries shrink too tightly on positives or drift excessively toward negatives—by regularizing the class-wise margins. This construction ensures balanced, equidistant decision boundaries permitting rejection of unknowns (open set). At inference, two confidence measures are available:

  • confcls(x)=maxipsoftmax(y^=ix)conf_{cls}(x) = \max_i p_{softmax}(\hat{y}=i|x)
  • confbcls(x)=psigmoid(y^kx)conf_{bcls}(x) = p_{sigmoid}(\hat{y}^{k^*}|x), with k=argmaxipsoftmax(y^=ix)k^* = \arg\max_i p_{softmax}(\hat{y}=i|x)

If confidence falls below threshold μ\mu, xx is labeled as unknown.

5. Empirical Performance and Comparative Results

Comparative evaluation across PACS (Art, Cartoon, Photo, Sketch; 6:1 known:unknown class split), Office-Home (four domains, 65 classes), and Digits-DG (MNIST, MNIST-M, SVHN, SYN; 6:4 split) under leave-one-domain-out protocol demonstrates MEDIC’s superiority in open set metrics:

Benchmark Prior SOTA (OSCR) MEDIC-bcls (OSCR) Closed-set Acc.
PACS ResNet50 DAML: ~73.7% ~84.9%
Digits-DG ConvNet MLDG: ~68.4% ~71.2%
Office-Home SWAD: — ~71.2%

For PACS, MEDIC-bcls outperforms DAML by \geq1.7%, and boosts OSCR on Digits-DG by \approx2.8% over MLDG. MEDIC maintains close-set accuracy competitive with or exceeding top DG methods (e.g., ~71.2% for Office-Home vs. 70.6% for SWAD).

Ablation studies indicate that:

  • ERM (no meta) + bcls: H ~79.9 / OSCR ~81.0,
  • MLDG (domain-wise meta only) + bcls: H ~79.9 / OSCR ~82.5,
  • MEDIC (domain- & class-wise meta) + bcls: H ~83.0 / OSCR ~84.9.

Both domain- and class-wise gradient matching are necessary for full efficacy.

6. Implementation Guidance and Limitations

Implementation should begin from a robust DG baseline (e.g., MLDG, ERM+MixStyle), then integrate the OVA head and MEDIC meta-learning steps. Threshold μ\mu should be tuned via H-score on held-out source domains, with OSCR curves providing further guidance. Parameter sharing between OVA and softmax heads reduces overhead with negligible loss of accuracy.

Potential extensions include alternative gradient alignment methods (variance matching as in Fishr) or adversarial synthesis of pseudo-unknowns. MEDIC’s limitations include its dependence on a tunable threshold for unknown rejection—which may drift on extreme domain shifts—and possible challenge in separating semantically similar unknown classes due to OVA margin blurring.

7. Context Within Open Domain Generalization Research

MEDIC directly addresses challenges highlighted by previous OSDG and OpenDG works such as DAML (Shu et al., 2021), which meta-learn over augmented domains/labels; in contrast, MEDIC regularizes class- and domain-wise gradients to construct robust OVA boundaries in a single shared backbone, enhancing both open- and close-set performance. The paradigm shift from augmenting source domain coverage (DAML) to directly sculpting decision boundaries (MEDIC) is empirically validated by superior open set metrics while retaining closed set capacity.

A plausible implication is that this approach, while developed for vision tasks, is adaptable to other modalities (such as music editing or audio-visual domains) wherever domain shift and unknown classes pose practical deployment barriers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Music Editing with Disentangled Inversion Control (MEDIC).