Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximate kernel clustering (0807.4626v2)

Published 29 Jul 2008 in cs.DS, cs.CC, and math.FA

Abstract: In the kernel clustering problem we are given a large $n\times n$ positive semi-definite matrix $A=(a_{ij})$ with $\sum_{i,j=1}na_{ij}=0$ and a small $k\times k$ positive semi-definite matrix $B=(b_{ij})$. The goal is to find a partition $S_1,...,S_k$ of ${1,... n}$ which maximizes the quantity $$ \sum_{i,j=1}k (\sum_{(i,j)\in S_i\times S_j}a_{ij})b_{ij}. $$ We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song, Smola, Gretton and Borgwardt. In some cases we manage to compute the sharp approximation threshold for this problem assuming the Unique Games Conjecture (UGC). In particular, when $B$ is the $3\times 3$ identity matrix the UGC hardness threshold of this problem is exactly $\frac{16\pi}{27}$. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when $B$ is the $k\times k$ identity matrix is $\frac{8\pi}{9}(1-\frac{1}{k})$ for every $k\ge 3$.

Citations (39)

Summary

We haven't generated a summary for this paper yet.