Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model (2111.14427v3)

Published 29 Nov 2021 in cs.LG and stat.ML

Abstract: We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. The pseudo-labeled examples are then added to the training set, and a new classifier is learned. This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lies Hadjadj (3 papers)
  2. Massih-Reza Amini (40 papers)
  3. Sana Louhichi (10 papers)
  4. Alexis Deschamps (6 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.