Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fair and Representative Subset Selection from Data Streams (2010.04412v2)

Published 9 Oct 2020 in cs.DS and cs.LG

Abstract: We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint $k$. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional \emph{fairness} constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a $ (\frac{1}{2}-\varepsilon) $-approximation algorithm that requires $ O(\frac{1}{\varepsilon} \log \frac{k}{\varepsilon}) $ passes over the stream for any constant $ \varepsilon>0 $. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of $(\frac{1}{2}-\varepsilon)$ when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely \emph{maximum coverage on large graphs} and \emph{personalized recommendation}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yanhao Wang (34 papers)
  2. Francesco Fabbri (22 papers)
  3. Michael Mathioudakis (28 papers)

Summary

We haven't generated a summary for this paper yet.