Papers
Topics
Authors
Recent
Search
2000 character limit reached

Combinatorial optimization of the coefficient of determination

Published 12 Oct 2024 in stat.ML, cs.DS, cs.LG, and math.CO | (2410.09316v1)

Abstract: Robust correlation analysis is among the most critical challenges in statistics. Herein, we develop an efficient algorithm for selecting the $k$- subset of $n$ points in the plane with the highest coefficient of determination $\left( R2 \right)$. Drawing from combinatorial geometry, we propose a method called the \textit{quadratic sweep} that consists of two steps: (i) projectively lifting the data points into $\mathbb R5$ and then (ii) iterating over each linearly separable $k$-subset. Its basis is that the optimal set of outliers is separable from its complement in $\mathbb R2$ by a conic section, which, in $\mathbb R5$, can be found by a topological sweep in $\Theta \left( n5 \log n \right)$ time. Although key proofs of quadratic separability remain underway, we develop strong mathematical intuitions for our conjectures, then experimentally demonstrate our method's optimality over several million trials up to $n=30$ without error. Implementations in Julia and fully seeded, reproducible experiments are available at https://github.com/marc-harary/QuadraticSweep.

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.