A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies (2311.17840v2)
Abstract: Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities ${d_{i,j}}{i , j \in [n]}$ over $n$ points, the goal is to find an embedding ${x_1,\dots,x_n} \in \mathbb{R}k$ that minimizes [\text{OPT} = \min{x} \mathbb{E}{i,j \in [n]} \left[ \left(1-\frac{|x_i - x_j|}{d{i,j}}\right)2 \right] ] Kamada-Kawai provides a more relaxed measure of the quality of a low-dimensional metric embedding than the traditional bi-Lipschitz-ness measure studied in theoretical computer science; this is advantageous because strong hardness-of-approximation results are known for the latter, Kamada-Kawai admits nontrivial approximation algorithms. Despite its popularity, our theoretical understanding of MDS is limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai in the constant-$k$ regime, with cost $\text{OPT} +\epsilon$ in $n2 2{\text{poly}(\Delta/\epsilon)}$ time, where $\Delta$ is the aspect ratio of the input. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$: we achieve a solution with cost $\tilde{O}(\log \Delta)\text{OPT}{\Omega(1)}+\epsilon$ in time $n{O(1)}2{\text{poly}(\log(\Delta)/\epsilon)}$. Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards developing general-purpose techniques for efficient metric optimization algorithms.
- Minimum-distortion embedding. Foundations and Trends® in Machine Learning, 14(3):211–378, 2021.
- On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Comput., 28(3):1073–1085, 1999. Announced at SODA 1996.
- Fitting tree metrics: Hierarchical clustering and phylogeny. SIAM J. Comput., 40(5):1275–1291, 2011. Announced at FOCS 2005.
- An analysis of the t-sne algorithm for data visualization. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018, volume 75 of Proceedings of Machine Learning Research, pages 1455–1462. PMLR, 2018.
- Mihai Badoiu. Approximation algorithm for embedding metrics into a two-dimensional space. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA, pages 434–443. ACM/SIAM, 2003.
- Yair Bartal. Probabilistic approximations of metric spaces and its algorithmic applications. In FOCS, pages 184–193, 1996.
- Low-distortion embeddings of general metrics into the line. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 225–233, 2005.
- Embedding ultrametrics into low-dimensional spaces. In Proceedings of the twenty-second annual symposium on Computational geometry, pages 187–196, 2006.
- Dimensionality reduction: theoretical perspective on practical measures. Advances in Neural Information Processing Systems, 32, 2019.
- Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
- Rounding semidefinite programming hierarchies via global correlation. In 2011 ieee 52nd annual symposium on foundations of computer science, pages 472–481. IEEE, 2011.
- The isomap algorithm and topological stability. Science, 295(5552):7–7, 2002.
- Multidimensional scaling. CRC press, 2000.
- Improving ultrametrics embeddings through coresets. In ICML, 2021.
- Fitting distances by tree metrics minimizing the total error within a constant factor. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 468–479. IEEE, 2021.
- The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4):370–379, 2007.
- The collective dynamics of smoking in a large social network. New England journal of medicine, 358(21):2249–2258, 2008.
- Chaomei Chen. Searching for intellectual turning points: Progressive knowledge domain visualization. Proceedings of the National Academy of Sciences, 101(suppl_1):5303–5310, 2004.
- On efficient low distortion ultrametric embedding. In ICML, volume 119, pages 2078–2088, 2020.
- Correlation clustering with sherali-adams. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 651–661. IEEE, 2022.
- Kedar Dhamdhere. Approximating additive distortion of embeddings into line metrics. In APPROX-RANDOM, pages 96–104, 2004.
- Multidimensional scaling: Approximation and complexity. In International Conference on Machine Learning, pages 2568–2578. PMLR, 2021.
- DIMACS. Working group on algorithms for multidimensional scaling i. http://dimacs.rutgers.edu/archive/SpecialYears/2001_Data/Algorithms/program.html, 2001. Accessed: 2023-11-08.
- Scheduling with communication delays via lp hierarchies and clustering. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 822–833. IEEE, 2020.
- 13 theory of multidimensional scaling. Handbook of statistics, 2:285–316, 1982.
- Efficient algorithms for inverting evolution. J. ACM, 46(4):437–449, 1999. Announced at STOC 1996.
- A robust model for finding optimal evolutionary trees. Algorithmica, 13(1/2):155–179, 1995. Announced at STOC 1993.
- A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci., 69(3):485–497, 2004. Announced at STOC 2003.
- Jan W Gooch. Encyclopedic dictionary of polymers, volume 1. Springer Science & Business Media, 2010.
- Faster SDP hierarchy solvers for local rounding algorithms. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 197–206. IEEE Computer Society, 2012.
- Mapping the structural core of human cerebral cortex. PLoS biology, 6(7):e159, 2008.
- Fitting points on the real line and its application to rh mapping. In Algorithms—ESA’98: 6th Annual European Symposium Venice, Italy, August 24–26, 1998 Proceedings 6, pages 465–476. Springer, 1998.
- Approximating the best-fit tree under lpp{}_{\mbox{p}}start_FLOATSUBSCRIPT p end_FLOATSUBSCRIPT norms. In APPROX-RANDOM, pages 123–133, 2005.
- Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica, 24(1):1–13, 1999. Announced at SODA 1996.
- Stochastic neighbor embedding. Advances in neural information processing systems, 15, 2002.
- Lars Ivansson. Computational aspects of radiation hybrid mapping. PhD thesis, Numerisk analys och datalogi, 2000.
- Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1226–1236, 2019.
- An algorithm for drawing general undirected graphs. Information processing letters, 31(1):7–15, 1989.
- Integrality gaps of linear and semi-definite programming relaxations for knapsack. In Integer Programming and Combinatoral Optimization: 15th International Conference, IPCO 2011, New York, NY, USA, June 15-17, 2011. Proceedings 15, pages 301–314. Springer, 2011.
- Joseph B Kruskal. Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29(2):115–129, 1964.
- Multidimensional scaling. Sage, 1978.
- Metric structures in l1: dimension, snowflakes, and average distortion. European Journal of Combinatorics, 26(8):1180–1190, 2005.
- Jiří Matoušek. Bi-lipschitz embeddings into low-dimensional euclidean spaces. Commentationes Mathematicae Universitatis Carolinae, 31(3):589–600, 1990.
- Jiří Matoušek. On the distortion required for embedding finite metric spaces into normed spaces. Israel Journal of Mathematics, 93(1):333–344, 1996.
- Learning nonsingular phylogenies and hidden markov models. In STOC, pages 366–375, 2005.
- A birthday repetition theorem and complexity of approximating dense csps. arXiv preprint arXiv:1607.02986, 2016.
- Yet another algorithm for dense max cut: go greedy. In SODA, volume 8, pages 176–182, 2008.
- Inapproximability for metric embeddings into r^d. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 405–413. IEEE Computer Society, 2008.
- Sherali-adams relaxations of the matching polytope. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 293–302. ACM, 2009.
- Circular partitions with applications to visualization and embeddings. In Proceedings of the twenty-fourth annual symposium on Computational geometry, pages 28–37, 2008.
- Yuri Rabinovich. On average distortion of embedding metrics into the line. Discrete & Computational Geometry, 39(4):720–733, 2008.
- A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM Journal on Discrete Mathematics, 3(3):411–430, 1990.
- Approximation algorithms for low-distortion embeddings into low-dimensional spaces. SIAM Journal on Discrete Mathematics, 33(1):454–473, 2019.
- Metric embeddings with outliers. In SODA, pages 670–689, 2017.
- Warren S Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
- Eric W Weisstein. Legendre duplication formula. From MathWorld. http://mathworld. wolfram. com/LegendreDuplicationFormula. html, 2019.
- Forrest W Young. Multidimensional scaling: History, theory, and applications. Psychology Press, 2013.
- Approximation schemes via sherali-adams hierarchy for dense constraint satisfaction problems and assignment problems. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 423–438, 2014.