Supervised Learning: No Loss No Cry (2002.03555v1)

Published 10 Feb 2020 in cs.LG and stat.ML

Abstract: Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss function} for a downstream task (e.g., classification) has garnered recent interest. However, works in this area have been generally empirical in nature. In this paper, we revisit the {\sc SLIsotron} algorithm of Kakade et al. (2011) through a novel lens, derive a generalisation based on Bregman divergences, and show how it provides a principled procedure for learning the loss. In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses. By interpreting this through the lens of \emph{proper losses}, we derive a generalisation of {\sc SLIsotron} based on Bregman divergences. The resulting {\sc BregmanTron} algorithm jointly learns the loss along with the classifier. It comes equipped with a simple guarantee of convergence for the loss it learns, and its set of possible outputs comes with a guarantee of agnostic approximability of Bayes rule. Experiments indicate that the {\sc BregmanTron} substantially outperforms the {\sc SLIsotron}, and that the loss it learns can be minimized by other algorithms for different tasks, thereby opening the interesting problem of \textit{loss transfer} between domains.

Citations (12)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

LegendreTron: Uprising Proper Multiclass Loss Learning (2023)
All your loss are belong to Bayes (2020)
Task Loss Estimation for Sequence Prediction (2015)
The Geometry and Calculus of Losses (2022)
Learning with Non-Convex Truncated Losses by SGD (2018)