Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach (2301.01405v2)

Published 4 Jan 2023 in cs.LG

Abstract: Learning from noisy labels (LNL) plays a crucial role in deep learning. The most promising LNL methods rely on identifying clean-label samples from a dataset with noisy annotations. Such an identification is challenging because the conventional LNL problem, which assumes a single noisy label per instance, is non-identifiable, i.e., clean labels cannot be estimated theoretically without additional heuristics. In this paper, we aim to formally investigate this identifiability issue using multinomial mixture models to determine the constraints that make the problem identifiable. Specifically, we discover that the LNL problem becomes identifiable if there are at least $2C - 1$ noisy labels per instance, where $C$ is the number of classes. To meet this requirement without relying on additional $2C - 2$ manual annotations per instance, we propose a method that automatically generates additional noisy labels by estimating the noisy label distribution based on nearest neighbours. These additional noisy labels enable us to apply the Expectation-Maximisation algorithm to estimate the posterior probabilities of clean labels, which are then used to train the model of interest. We empirically demonstrate that our proposed method is capable of estimating clean labels without any heuristics in several label noise benchmarks, including synthetic, web-controlled, and real-world label noises. Furthermore, our method performs competitively with many state-of-the-art methods.

Summary

We haven't generated a summary for this paper yet.