Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Objective Weighted Sampling (1509.07445v6)

Published 24 Sep 2015 in cs.DB and cs.DS

Abstract: {\em Multi-objective samples} are powerful and versatile summaries of large data sets. For a set of keys $x\in X$ and associated values $f_x \geq 0$, a weighted sample taken with respect to $f$ allows us to approximate {\em segment-sum statistics} $\text{Sum}(f;H) = \text{sum}_{x\in H} f_x$, for any subset $H$ of the keys, with statistically-guaranteed quality that depends on sample size and the relative weight of $H$. When estimating $\text{Sum}(g;H)$ for $g\not=f$, however, quality guarantees are lost. A multi-objective sample with respect to a set of functions $F$ provides for each $f\in F$ the same statistical guarantees as a dedicated weighted sample while minimizing the summary size. We analyze properties of multi-objective samples and present sampling schemes and meta-algortithms for estimation and optimization while showcasing two important application domains. The first are key-value data sets, where different functions $f\in F$ applied to the values correspond to different statistics such as moments, thresholds, capping, and sum. A multi-objective sample allows us to approximate all statistics in $F$. The second is metric spaces, where keys are points, and each $f\in F$ is defined by a set of points $C$ with $f_x$ being the service cost of $x$ by $C$, and $\text{Sum}(f;X)$ models centrality or clustering cost of $C$. A multi-objective sample allows us to estimate costs for each $f\in F$. In these domains, multi-objective samples are often of small size, are efficiently to construct, and enable scalable estimation and optimization. We aim here to facilitate further applications of this powerful technique.

Citations (22)

Summary

We haven't generated a summary for this paper yet.