Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more (1802.07382v3)

Published 21 Feb 2018 in cs.LG and cs.DS

Abstract: Coreset (or core-set) is a small weighted \emph{subset} $Q$ of an input set $P$ with respect to a given \emph{monotonic} function $f:\mathbb{R}\to\mathbb{R}$ that \emph{provably} approximates its fitting loss $\sum_{p\in P}f(p\cdot x)$ to \emph{any} given $x\in\mathbb{R}^d$. Using $Q$ we can obtain approximation of $x^*$ that minimizes this loss, by running \emph{existing} optimization algorithms on $Q$. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than $n=|P|$ for general monotonic loss functions. (ii) A proof that, under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for \emph{any} input $P$. (iii) A generic coreset construction algorithm that computes such a small coreset $Q$ in $O(nd+n\log n)$ time, and (iv) Experimental results which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.

Citations (13)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

New Frameworks for Offline and Streaming Coreset Constructions (2016)
Introduction to Coresets: Approximated Mean (2021)
A Unified Approach to Coreset Learning (2021)
Coresets for Near-Convex Functions (2020)
Coresets for Kinematic Data: From Theorems to Real-Time Systems (2015)