Papers
Topics
Authors
Recent
Search
2000 character limit reached

Consistency of archetypal analysis

Published 16 Oct 2020 in math.ST, math.OC, math.PR, and stat.ML | (2010.08148v2)

Abstract: Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed $k$, the method finds a convex polytope with $k$ vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.

Citations (3)

Summary

  • The paper establishes convergence of archetypal analysis for compact supports by proving that AA solutions converge to the continuum minimizer with quantifiable rates.
  • It introduces a penalized variant for unbounded distributions that controls archetype variance, assuring bounded and stable solutions.
  • Numerical experiments validate the theoretical findings with evidence of robust convergence across initializations and under different data distributions.

Consistency Properties of Archetypal Analysis

Archetypal Analysis: Problem Definition and Prior Work

Archetypal analysis (AA) is an unsupervised learning method that constructs a convex polytope, with kk vertices (archetype points), to summarize multivariate data XNRdX_N \subset \mathbb R^d. The optimal polytope minimizes the root mean squared distance from the data points to the convex hull of the archetypes, constrained so that the polytope lies within the convex hull of the data. Formally, for ARdA \subset \mathbb R^d with A=k|A| = k, the optimization is:

minARdF(A),Aco(XN),F(A)=(1Ni=1Nd2(xi,co(A)))1/2.\min_{A \subset \mathbb R^d} F(A), \quad A \subset co(X_N), \qquad F(A) = \left( \frac{1}{N} \sum_{i=1}^N d^2(x_i, co(A)) \right)^{1/2}.

AA was introduced by Cutler and Breiman (1994); its boundary-constrained nature makes it sensitive to outliers. Subsequent literature has developed robust extensions and efficient algorithms, including alternating minimization and matrix factorization comparisons [see references within (2010.08148)].

Consistency Results for Archetypal Analysis with Compact Support

A central question addressed is the consistency of AA: If data x1,x2,x_1, x_2, \ldots are i.i.d. from measure μ\mu, do the archetype solutions ANA_N converge as NN \to \infty? For measures with compact support, the paper rigorously establishes:

  • Existence and placement: For such measures, minimizers exist, and there always exists an archetype pointset on the boundary of co(supp(μ))co(supp(\mu)), but not necessarily all minimizers are boundary points. Figure 1

    Figure 1: A illustration of an example where the archetype pointset is attained on the interior of co(XN)co(X_N).

  • Consistency theorem: Let ANA_N be the discrete AA solution for NN samples and AA_\star be a continuum minimizer for μ\mu. Then ANA_N has a convergent subsequence (under the d2,d_{2,\infty} metric), with limit in the solution set of the continuum problem, almost surely. If the continuum solution is unique, ANA_N converges almost surely.
  • Convergence rates: The convergence rate of optimal values is established, depending on the geometry of the distribution's support (via the α\alpha-cap condition):

Fμ(AN)Fμ(A)(logNN)1/αF_\mu(A_N) - F_\mu(A_\star) \lesssim \left( \frac{\log N}{N} \right)^{1/\alpha}

For uniform distributions on convex sets with positive density, this rate is O((logN/N)1/d)\mathcal{O}((\log N / N)^{1/d}).

Archetypal Analysis for Unbounded Support Distributions

For distributions with unbounded support, direct AA fails to be consistent, as the convex hull of sampled points grows without bound. The paper proposes a penalized variant:

Fν,α(A)=(Fν2(A)+αV(A))1/2F_{\nu,\alpha}(A) = \left( F_\nu^2(A) + \alpha V(A) \right)^{1/2}

where V(A)V(A) is the variance of the archetype pointset and α>0\alpha > 0 is a penalty parameter, limiting the spread of the archetypes and ensuring the solution doesn't escape to infinity.

  • Existence: For square-integrable ν\nu, minimizers exist and are bounded.
  • Consistency: The penalized AA is consistent; solutions for finite samples converge (subsequences) to a minimizer of the continuum penalized problem.
  • Asymptotics as penalty increases: As α\alpha \to \infty, the archetypes collapse to the data mean, matching intuition.

Numerical Experiments: Empirical Convergence and Dependence on Distribution

Numerical experiments confirm theoretical results:

  • Convergence of archetype polytope for uniform disk: Archetype points converge to regular kk-polygons inscribed in the disk, as predicted analytically. Figure 2

    Figure 2: The function I ⁣:[0,π]RI \colon [0,\pi] \to \mathbb{R} that determines the squared objective value for the regular kk-polygon inscribed in the unit disk.

  • Effect of penalty parameter for normal/annular distributions: The area of co(A)co(A) shrinks as α\alpha increases. Figure 3

    Figure 3: The area of co(A)co(A) as α\alpha is varied for N=30,000N = 30{,}000 random data points generated from a normal distribution.

  • Robustness to initialization and sample randomness: Multiple experiments with various initializations confirm that final solutions are insensitive to starting points, provided constraints and penalty are in place. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Snapshots of the iterates for different initializations; final column displays converged archetype polytope.

  • Behavior for Gaussian mixture: In high-anisotropy scenarios, archetypes remain consistent and contain the mean as α\alpha grows. Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5: Snapshots of the iterations for different sampled data from a Gaussian mixture model; final iterate contains the global mean.

Implications, Theoretical Impact, and Prospects

The paper establishes strong theoretical foundations for AA, paralleling rigorous results for kk-means and other unsupervised clustering methods. It precisely quantifies convergence and rates for AA under realistic sampling assumptions and extends AA to robust, regularized formulations for general distributions. These results imply that in large-scale unsupervised settings, AA provides stable, interpretable extremal summaries of data, provided either compact support or judicious regularization.

The experimental analysis demonstrates practical robustness and scalability of the proposed numerical methods, and the theoretical framework suggests possibilities for further penalty design (including kk-means-type and volume penalties), connections to convex geometry, and alternative distance metrics (e.g., Wasserstein).

Future directions include adaptive sampling for efficient computation and extension to nonlinear AA (e.g., neural network-based formulations), as well as rigorous consistency and convergence results for emerging robust AA variants and deep archetypal models.

Conclusion

This work delivers a rigorous statistical analysis of the consistency of archetypal analysis for both compact and noncompact distributions. It provides convergence guarantees, asymptotic rates, and robust regularized approaches supported by empirical evidence. The theoretical results enable AA to serve as a stable unsupervised tool for extremal data summarization, with strong guarantees in diverse settings, and pave the way for innovative forms of data summarization and representation in AI research (2010.08148).

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.