Recovering Trees with Convex Clustering (1806.11096v2)
Abstract: Convex clustering refers, for given $\left{x_1, \dots, x_n\right} \subset \mathbb{R}p$, to the minimization of \begin{eqnarray*} u(\gamma) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}{n}{\lVert x_i - u_i \rVert2} + \gamma \sum_{i,j=1}{n}{w_{ij} \lVert u_i - u_j\rVert},\ \end{eqnarray*} where $w_{ij} \geq 0$ is an affinity that quantifies the similarity between $x_i$ and $x_j$. We prove that if the affinities $w_{ij}$ reflect a tree structure in the $\left{x_1, \dots, x_n\right}$, then the convex clustering solution path reconstructs the tree exactly. The main technical ingredient implies the following combinatorial byproduct: for every set $\left{x_1, \dots, x_n \right} \subset \mathbb{R}p$ of $n \geq 2$ distinct points, there exist at least $n/6$ points with the property that for any of these points $x$ there is a unit vector $v \in \mathbb{R}p$ such that, when viewed from $x$, `most' points lie in the direction $v$ \begin{eqnarray*} \frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}{n}{ \left\langle \frac{x_i - x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}. \end{eqnarray*}