Multidimensional Scaling (MDS)
- Multidimensional Scaling (MDS) is a set of techniques that embed high-dimensional dissimilarity data into low-dimensional Euclidean spaces while preserving pairwise distances.
- It encompasses classical, metric, non-metric, Sammon mapping, and Isomap variants, each using distinct optimization methods to faithfully reconstruct data geometry.
- MDS techniques scale to large datasets via landmark and interpolation methods and extend to multi-view, non-Euclidean, and infinite-dimensional frameworks.
Multidimensional Scaling (MDS) is a foundational family of techniques in geometry-aware data analysis and manifold learning. It seeks a configuration of points in a geometric space such that the pairwise distances among these points reflect a given matrix of input dissimilarities or distances. Applications encompass dimensionality reduction, visualization, feature learning, computational linguistics, shape matching, and network analysis. MDS encompasses classical (eigendecomposition-based) algorithms, iterative stress-minimization procedures, nonlinear and manifold variants such as Isomap, as well as scalability adaptations for large data and extensions to multiple, heterogeneous dissimilarity sources.
1. Mathematical Foundations and Classical MDS
Let denote a finite collection of objects, each typically represented as a point in (but not necessarily with explicit coordinates), and let be an matrix of pairwise dissimilarities . The central goal is to find for such that their Euclidean distances closely approximate the given dissimilarities.
Classical MDS is based on the following sequence (Ghojogh et al., 2020, Adams et al., 2019):
- Compute squared distances .
- Center with .
- Form the double-centered Gram matrix: 0.
- Compute the eigendecomposition 1.
- Select the top 2 eigenpairs 3 to form the embedding: The 4th point 5 has coordinates 6.
This configuration uniquely minimizes the so-called strain (Frobenius norm between empirical and reconstructed Gram matrices), achieving exact recovery when the input matrix 7 is Euclidean (Adams et al., 2019).
2. Variants: Metric, Non-metric, Sammon, and Isomap
- Metric MDS: Seeks 8 minimizing
9
for fixed positive weights 0 (often 1). The objective is nonconvex and typically tackled by gradient-based or majorization algorithms such as SMACOF (Ghojogh et al., 2020, Boyarski et al., 2017).
- Non-metric MDS: Preserves only the rank order of dissimilarities. Introduces a monotonic, typically isotonic-transform 2 and minimizes
3
alternating between updating 4 and fitting 5 (Ghojogh et al., 2020).
- Sammon Mapping: A weighted form where
6
with normalization 7, emphasizing fidelity for small distances (Ghojogh et al., 2020).
- Isomap: Replaces 8 by an estimated geodesic distance, using shortest paths on a neighborhood graph. After transforming these distances to a centered Gram matrix, Isomap applies the classical spectral embedding, capturing nonlinear manifold structures (Ghojogh et al., 2020).
3. Computation and Scalability for Large Data
For 9, classical algorithms become impractical due to cubic complexity. Partition-based approaches decompose the global problem:
- Landmark MDS: Selects 0 "landmarks", runs classical MDS on them, and triangulates new points via analytic formulas (Gower’s interpolation) (Delicado et al., 2020).
- Interpolation MDS, Reduced MDS, Pivot MDS, Divide-and-Conquer MDS, Fast MDS: Variants that partition, align, and stitch together local MDS results; see Table 1.
| Algorithm | Complexity (Time) | Memory | Comment |
|---|---|---|---|
| Landmark MDS | 1 | 2 | Efficient, low-memory |
| Interp MDS | 3 | 4 | Fastest in benchmark |
| Pivot MDS | 5 | 6 | Direct inner-product |
| Div&Conq MDS | 7 | 8 | More robust, slightly biased |
| Fast MDS | 9 | 0 | Recursive, higher const |
Empirical evaluations on million-point EMNIST images demonstrated that all these methods closely match the quality of classical MDS but differ in time/memory footprint (Delicado et al., 2020).
4. Extensions: Multiple Views, Non-Euclidean Embeddings, and Feature Learning
- Multi-view MDS (MVMDS): For 1 heterogeneous distance matrices (views) on the same objects, minimizes a weighted sum of stresses with automatic weight learning:
2
with controller parameter 3. Alternating minimization yields both the embedding 4 and optimal view-weights 5 (Bai et al., 2016).
- Feature Learning via MDS: MDS can serve as a feature learning framework when applied to high-level, semantics-sensitive distances (e.g., spatial pyramid matching) between objects rather than raw pixel or Euclidean distances (Wang et al., 2013). The process learns vector codes so that Euclidean distances between codes reflect semantic similarity.
- Non-Euclidean Targets: MDS can also be adapted to hyperbolic or spherical target spaces, with corresponding definitions of distance and gradient, enabling more faithful embeddings for inherently hierarchical or non-flat data (Cvetkovski et al., 2011, Liu et al., 2022).
5. Infinite, Continuous, and Operator-Theoretic Generalizations
MDS extends to infinite metric measure spaces 6 by defining the "double-centered" kernel:
7
and embedding via the spectrum of the associated Hilbert-Schmidt operator 8 on 9. The canonical embedding 0 (with 1) optimally minimizes infinite-sample "strain" (Adams et al., 2019, Kassab, 2019).
Continuous MDS frameworks permit the analysis of limit behavior as 2, establishing 3-consistency of embeddings and proposing "Approximate Lipschitz Embedding" (ALE) to guarantee equicontinuity and uniform convergence (Trosset et al., 2024).
6. Optimization, Computational Complexity, and Algorithmic Guarantees
- Majorization and Iterative Schemes: The SMACOF algorithm and its variants leverage quadratic majorization of the stress surface, ensuring monotonicity (Boyarski et al., 2017, Rajawat et al., 2016).
- Complexity and Hardness: Global minimization of metric MDS stress is NP-hard, even in one dimension and for bounded-diameter graphs, ruling out efficient exact algorithms (Demaine et al., 2021). On the other hand, polynomial-time approximation schemes (PTAS) are available in special cases (e.g., small-diameter graphs), and quasi-polynomial approximations have been developed for general dissimilarity matrices with controlled aspect ratios (Bakshi et al., 2023).
- Stochastic and Streaming Algorithms: Stochastic SMACOF and related incremental algorithms provide scalable and provably convergent solutions for large, dynamically evolving networks (Rajawat et al., 2016).
7. Applications and Theoretical Impact
Applications span manifold learning, visualization, feature extraction for object recognition, cross-linguistic semantic mapping, shape analysis, and cooperative network localization (Ghojogh et al., 2020, Wang et al., 2013, Klis et al., 2020, Rajawat et al., 2016). MDS forms the foundation for related methods such as Isomap and Spectral Generalized MDS, enables kernel-based, out-of-sample extensions, and admits both supervised and statistically informed extensions (e.g., 4-informed MDS for microbiome studies) (Kim et al., 2023).
Theoretical advances include:
- Operator-theoretic analysis and infinite-sample generalizations (Adams et al., 2019, Trosset et al., 2024).
- Consistency under random sampling and empirical-measure convergence (Kassab, 2019).
- Nontrivial error estimates for MDS under noise, regularization, or alternative divergence metrics (e.g., Sinkhorn divergence for shape spaces) (Yachimura et al., 2024, Peterfreund et al., 2018).
- New frameworks for embedding with guaranteed performance and statistical interpretability (Demaine et al., 2021, Kim et al., 2023).
These developments collectively underscore MDS as a central, extensible tool for distance-preserving geometric embedding, equipped with rigorous foundations, practical adaptations, and across-the-board impact in data science and computational geometry.