Classical Multidimensional Scaling (CMDS)

Updated 28 October 2025

Classical Multidimensional Scaling (CMDS) is a Euclidean embedding method that reconstructs object configurations from pairwise squared distances using double centering and spectral decomposition.
The dual basis framework represents the Gram matrix as a linear combination of rank-2 matrices, enhancing analysis of noisy or incomplete distance data and providing direct links to metric constraint optimization.
Analytic connections to metric nearness and spectral graph theory improve the stability and applicability of CMDS in handling sparse, incomplete, or constrained data scenarios.

Classical Multidimensional Scaling (CMDS) is a foundational Euclidean embedding technique that reconstructs the configuration of $n$ objects in a vector space based solely on their pairwise squared distances. The central algorithmic ingredient is the transformation of distance data into an inner product (Gram) matrix via double centering, followed by a spectral decomposition that yields coordinates for the embedded points. Recent developments have introduced a dual basis framework that represents the Gram matrix as a linear combination of rank-2 matrices corresponding to each object pair, providing both theoretical insight and practical tools for analyses involving incomplete or noisy data and explicit connections to metric constraint optimization.

1. Mathematical Foundations and the Classical CMDS Algorithm

Given a collection of objects with a (possibly incomplete) matrix of squared Euclidean distances, $D \in \mathbb{R}^{n \times n}$ , CMDS forms the centered inner product (Gram) matrix via the double centering operation: $X = -\frac{1}{2} J D J,$ where $J = I - \frac{1}{n} 11^\top$ is the centering projection. The spectral decomposition of $X$ yields the principal coordinates: $X = V \Lambda V^\top,$ where $V = [v_1, \ldots, v_n]$ and $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$ . For rank $r$ , the embedding is given by $Y = V_r \Lambda_r^{1/2}$ .

Standard CMDS, while efficient and optimal in the strain metric for Euclidean-consistent $D$ , does not natively facilitate sparse constraints, explicit local distance handling, or detailed spectral connection to related optimization problems (e.g., enforcing the triangle inequality for metric nearness).

2. Dual Basis Framework: Construction and Explicit Formulas

A major development is the construction of a dual basis tailored to the space of zero-centered, symmetric $n \times n$ matrices. For the unordered index set $I = \{(i, j) : 1 \leq i < j \leq n\}$ , cardinality $L = n(n-1)/2$ , one defines: $w_{(i,j)} = \mathbf{e}_{i,i} + \mathbf{e}_{j,j} - \mathbf{e}_{i,j} - \mathbf{e}_{j,i},$ where $\mathbf{e}_{a,b}$ has a $1$ at position $(a, b)$ , zeros elsewhere.

The dual basis matrices $\{v_{(i,j)}\}$ satisfy the property $\langle v_\alpha, w_\beta \rangle = \delta_\alpha^\beta$ (Kronecker delta). The paper provides a unified and explicit formula for all dual basis matrices: $v_{(i,j)} = -\frac{1}{2}(a b^\top + b a^\top),$ with $a = \mathbf{e}_i - \frac{1}{n}\mathbf{1}$ and $b = \mathbf{e}_j - \frac{1}{n}\mathbf{1}$ . Here, each $v_{(i,j)}$ is a symmetric, rank-2, zero-centered matrix, and the coordinate representation for any zero-centered Gram matrix $X$ can be expanded as

$X = \sum_{(i, j) \in I} D_{i,j}\, v_{(i,j)}.$

This recovers the classical double-centering formula and enables expressing $X$ as a linear expansion in terms of observed distances.

3. Spectrum of the Essential Inner Product Matrix

Define the Gramian matrix for the basis: $H_{\alpha, \beta} = \langle w_\alpha, w_\beta \rangle,$ where, for $\alpha = (i,j)$ and $\beta = (k, l)$ ,

$H_{\alpha,\beta} = \begin{cases} 4 & \alpha = \beta \ 1 & |\{i, j\} \cap \{k, l\}| = 1 \ 0 & \text{otherwise}. \end{cases}$

$H$ is shown to be $4I_L + A(T)$ , where $A(T)$ is the adjacency matrix of the triangular graph $T$ on the $n$ points.

The complete spectrum of $H$ is characterized:

Eigenvalue $2$, multiplicity $L-n$
Eigenvalue $n$ , multiplicity $n-1$
Eigenvalue $2n$, multiplicity $1$

This spectral structure governs the conditioning and invertibility of local expansions and clarifies the algebraic geometry underlying CMDS, as well as its links to spectral graph theory.

Notably, each dual basis matrix $v_{(i,j)}$ has a simple spectral structure: two nonzero eigenvalues, $\frac{1}{2}$ and $-\frac{1}{2} + \frac{1}{n}$ , associated respectively to the vectors $a - b$ and $a + b$ .

4. Stability and Applications to Noisy and Incomplete Distance Data

The dual basis approach provides an analytic mechanism for tracking the propagation of noise in the input distance matrix $D$ to the Gram matrix $X$ , due to the explicit and linear dependence of $X$ on each $D_{i,j}$ . Because each $v_{(i,j)}$ is explicitly known, perturbation bounds can be established for $X$ under arbitrary additive noise in $D$ , with clear consequences for the stability of the reconstructed configuration.

Furthermore, because the expansion works for any subset $S \subset I$ of the distances, the dual basis method naturally extends to problems with missing data, sparse constraints, or partial observations—a regime where classical global eigendecomposition is not directly applicable.

5. Analytic Link to the Metric Nearness Problem

The metric nearness problem requires projecting an arbitrary dissimilarity matrix onto the convex cone of metric (distance) matrices, enforcing all triangle inequalities. The relevant constraint matrix $A$ encodes all triangle inequalities as linear inequalities.

The paper establishes that $A^\top A$ is, up to scaling and a shift, essentially the negative of the essential matrix $H$ from the dual basis framework: $A^\top A = (3n-2)I - H.$ Consequently, the singular values of $A$ in the metric nearness literature are directly determined by the spectrum of $H$ . This analytic link provides not only a theoretical foundation for empirical observations in metric projection algorithms but also a direct conduit for transferring advances in spectral CMDS analysis to constrained and regularized embedding problems.

6. Comparison to Classical Approach and Advantages of the Dual Basis Formulation

The standard CMDS framework relies exclusively on the double centering and eigendecomposition of $X$ , viewing the embedding as a global operation. In contrast, the dual basis approach:

Enables explicit linear expansion of $X$ in terms of the observed $D_{i,j}$ and analytically tractable basis elements $v_{(i,j)}$ .
Facilitates local analysis, inclusion of sparsity, and natural handling of missing or incomplete data.
Supports direct computation of spectral quantities relevant for noise sensitivity and for embedding under partial constraints.
Establishes a natural bridge between CMDS, distance geometry, and convex metric approximation problems.

Aspect	Standard CMDS	Dual Basis Approach
Gram matrix representation	Double centering + eigendecomposition	Explicit sum over dual basis
Dual basis vectors	Not used	Uniform formula: $-\frac{1}{2}(ab^\top + ba^\top)$
Spectrum analysis	Focus on Gram matrix $X$	Complete spectrum of matrix $H$
Applicability to constraints	Not natural	Supports triangle inequalities, missing data
Metric nearness connection	No explicit link	Direct analytic connection via $H$ and $A^\top A$

7. Implications and Future Directions

The dual basis approach offers enhanced interpretability for CMDS, analytical tractability for both full and partial data regimes, and an explicit connection to relevant quadratic forms in metric nearness problems. This methodology generalizes naturally to the analysis of structured incomplete data, constrained distance completion, and noisy geometric inference. The completed spectral analysis provides a foundation for further work on the geometry of embedding under linear constraints, stability to noise, and efficient convex optimization in metric spaces.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Classical Multidimensional Scaling (CMDS).