Projective Clustering Overview

Updated 26 December 2025

Projective clustering is a method that partitions high-dimensional data into k clusters, each associated with a j-dimensional subspace, generalizing both k-means and PCA.
It utilizes algorithmic strategies such as EM/alternating minimization, coreset constructions, and random projection techniques to optimize non-convex objectives.
Its robust statistical guarantees and adaptability to streaming data make projective clustering pivotal for scalable and precise geometric data analysis.

Projective clustering is the task of partitioning a finite metric dataset in high-dimensional Euclidean space into $k$ clusters, each associated with a $j$ -dimensional affine or linear subspace (termed "j-flat") such that a global cost function—typically the sum (or another $\ell_p$ aggregate) of distances of each point to its nearest subspace center—is minimized. This paradigm generalizes both center-based clustering (e.g., $k$ -means/ $k$ -median for $j=0$ ) and subspace approximation (e.g., PCA, when $k=1$ ) to the more expressive regime where both $k>1$ and $j>0$ , offering adaptive adaptation to anisotropic and manifold-structured data.

1. Formal Definitions and Problem Variants

Let $P\subset \mathbb{R}^d$ be a finite set of $n$ points. For integers $k\ge1$ , $0\leq j<d$ , and $\rho\in[1,\infty]$ , a $(k, j, \rho)$ projective clustering seeks a collection $\mathcal{F} = \{F_1,\dots,F_k\}$ of affine $j$ -dimensional subspaces minimizing

$\mathrm{cost}_\rho(P, \mathcal{F}) = \left(\sum_{p\in P} \min_{i = 1}^{k} \mathrm{dist}(p, F_i)^\rho \right)^{1/\rho},\quad [\rho<\infty]$

or, for the hard version,

$\mathrm{cost}_\infty(P, \mathcal{F}) = \max_{p\in P} \min_{i=1}^k \mathrm{dist}(p, F_i).$

Here, $\mathrm{dist}(p, F) = \|p - \pi_F(p)\|_2$ denotes the Euclidean distance to the subspace $F$ (Kerber et al., 2014, Tukan et al., 2022). Standard cases include:

$(0, k, 2)$ : $k$ -means
$(1, 1, \infty)$ : minimum enclosing cylinder
$(j, 1, 2)$ : subspace approximation
General $(k, j, 2)$ : projective clustering (sum of squared errors to closest $j$ -flats)

Different objectives, such as average $L_2$ , average $L_1$ , or general $\ell_\tau$ forms, are encompassed in this framework (Ding et al., 2012).

2. Algorithmic Methodologies and Structural Results

Projective clustering is computationally intractable in general—NP-hard even for $(k=2, j=1, d=2)$ —with optimization objectives that are highly non-convex and discontinuous (Maalouf et al., 2020). Standard approaches organize around the following themes:

Expectation-Maximization/Alternating Minimization: An EM-style procedure alternates between hard assignments of each $p_i$ to the closest current subspace, and recomputation of the optimal $j$ -flat for each cluster (solved by SVD/PCA on assigned points) (Maalouf et al., 2020). This converges to a local optimum.
Coreset Construction: To enable tractable approximation in high dimension and/or streaming regimes, strong and weak coreset schemes are central. A $(1\pm\varepsilon)$ coreset is a (weighted) subset $(C, w_C)$ with the property that for any $k$ $j$ -flats $S$ ,

$| \mathrm{cost}_\rho(P, S) - \mathrm{cost}_\rho(C, S) | \leq \varepsilon\, \mathrm{cost}_\rho(P, S).$

Crucially, advanced constructions either allow a single input-dependent additive constant $\Delta$ (Feldman et al., 2018), use sensitivity sampling (Tukan et al., 2022), or leverage dimensionality reduction followed by grid-based or randomized projections (Statman et al., 2020, Pratap et al., 2016, Kerber et al., 2014), achieving coreset sizes independent of $n,d$ (up to log factors) and thus enabling streaming and distributed algorithms.

Random Projections and Sketching: Johnson–Lindenstrauss-type subspace embeddings are employed to reduce $P\subset\mathbb{R}^d$ to $P'\subset\mathbb{R}^m$ with

$m = O\left(\frac{(j+1)^2 \log((j+1)/\varepsilon)}{\varepsilon^3}\, \log n\right),$

so that distances to all $q$ -flats are $\varepsilon$ -preserved, for all $k$ and cluster assignments (Kerber et al., 2014).

Projection Pursuit and Nonparametric Modal Clustering: In high-dimensional, non-Gaussian settings, projection pursuit methods (e.g., PPGMMGA) search for low-dimensional projections maximizing a non-Gaussianity index (e.g., negentropy estimated via GMM) (Scrucca, 2021). Once a maximally informative projection is found, nonparametric modal EM is used to identify cluster modes in that subspace without assuming ellipticity.

Table: Principal Approaches to Projective Clustering

Approach	Key Feature	Approximation Type
EM/Alternating Minimization	Local SVD for each cluster	Local optimum
Strong/Weak Coreset Construction	Sampling, compressing, merging	$(1\pm\varepsilon)$ cost for all $k, j$
Projection Pursuit + Modal EM	Non-Gaussian low-dim projections	Multimodal density basin assignment
Random Projection/Sketching	Dimension-independent coresets	$\varepsilon$ -approximation in reduced space

3. Statistical Learning and Generalization Guarantees

The generalization rate of projective clustering under statistical learning is governed by the complexity of the subspace function class. For $k$ clusters of $j$ -dimensional subspaces and $n$ IID samples from an unknown distribution $\mathcal{D}$ , the excess risk of empirical minimization admits an upper bound (Bucarelli et al., 2023): $\mathcal{E}_n = O\left(\sqrt{\frac{k j^2}{n}}\, \mathrm{polylog}(n)\right)$ and a minimax lower bound

$\mathcal{E}_n = \Omega\left(\sqrt{\frac{k j}{n}}\right)$

for the squared-error objective ( $\ell_2^2$ ). These results are near-optimal up to logarithmic factors, establish that projective clustering scales statistically harder than $k$ -means (where $j=0$ ), and depend critically on $j$ and $k$ (Bucarelli et al., 2023).

The analysis utilizes techniques from empirical process theory, involving Gaussian or Rademacher complexity bounds constructed via chaining over $\varepsilon$ -nets of subspace parameterizations, and leverages combinatorial constructions to derive lower bounds.

4. Coreset Constructions: Theory and Algorithms

Over the last decade, precise coreset methods have fundamentally advanced practical projective clustering in large-scale and high-dimensional data settings:

Additive Constant Coresets: By allowing approximation of the clustering cost up to an additive $\Delta = \|A - A^{(m_0)}\|_F^2$ (the residual norm of the truncated SVD), it becomes possible to create constant-size coresets in time polynomial in $n, d$ but with final size independent of both $n, d$ . This is achieved via (i) projection to low-rank $O(kj/\varepsilon^2)$ dimensional space, (ii) integer grid snapping, and (iii) sensitivity-based sampling (Feldman et al., 2018).
Randomized Low-Rank Sketching: As an acceleration, randomized JL-based sketching replaces exact SVD by a random projection prior to SVD, maintaining the theoretical cost approximation up to $\varepsilon$ (Pratap et al., 2016).
$L_\infty$ and M-Estimator Coresets: For hard-clustering objectives and more robust statistical loss functions (e.g., Huber, Tukey, Cauchy), advanced $L_\infty$ coreset constructions use Carathéodory sets and rounding ellipsoids to obtain polynomial- $d$ size coresets, followed by sensitivity sampling to handle general M-estimators (Tukan et al., 2022).
Streaming and Merge-Reduce: Modern coreset techniques, especially those permitting additive constants, are closed under union. This enables “merge-and-reduce” protocols for distributed or streaming data, incurring only logarithmic blowup in coreset size (Feldman et al., 2018, Tukan et al., 2022).

5. Specialized Applications and Variants

Projective clustering underpins methodology and performance improvements across multiple domains:

Embedding Compression in NLP: By replacing the single SVD-compressed embedding matrix with parallel low-rank projections onto $k$ learned $j$ -subspaces, deep NLP models (e.g., DistilBERT, RoBERTa) achieve 40%–43% embedding-layer compression with minimal (<0.8%) average accuracy loss over GLUE tasks, surpassing single-rank SVD decompositions by a significant margin (Maalouf et al., 2020).
Approximate Nearest Neighbor Search: Projective Clustering Product Quantization (PCPQ, Q-PCPQ, APCPQ) enhances MIPS accuracy by combining projective clustering with quantized dictionaries, giving exponentially more codewords than standard PQ or ScaNN, thereby improving inner-product recall at only a linear overhead in computation (Krishnan et al., 2021).
High-Dimensional Spectral and Modal Clustering: Techniques such as PPGMMGA+modal EM enable detection and allocation of multimodal cluster structure in projection-pursuit subspaces, without restricting cluster shape (Scrucca, 2021).
Image Clustering and Scattering Representations: Projective operations such as projection onto the orthogonal complement of shared principal directions (POC) are shown to significantly enhance spectral clustering on scattering features, outperforming conventional shallow clustering methods on MNIST, USPS, and other image datasets (Villar-Corrales et al., 2020).

6. Complexity, Approximability, and Robustness

The (approximate) computational complexity of projective clustering is a function of $(k, j, \varepsilon)$ but can be made nearly-linear in $n, d$ when leveraging efficient coreset or projection-based reduction:

$O(n d m^*\log n)$ : size- $O(m^*\log n)$ coreset construction for general $(k, j)$ projective clustering, with $m^*$ dependent on the optimal $m$ -line cost relative to $k$ $j$ -flats (Statman et al., 2020).
$O(n d k)$ : $O(\log k)$ -approximation to $k$ -line clustering.
Random projections of dimension $O((j+1)^2\log((j+1)/\varepsilon)/\varepsilon^3\log n)$ simultaneously approximate all projective $(k,q,\rho)$ clustering objectives, independent of $k$ (Kerber et al., 2014).

Coreset-based frameworks readily incorporate $z$ -outlier robustness by appropriate modification to the cost function and sampling weights, with only minor increases in approximation error (Statman et al., 2020 Ding et al., 2012). Modal and nonparametric variations using projection pursuit and density estimation are robust to cluster anisotropy, non-Gaussianity, and can accommodate clusters beyond simple ellipsoidal structure (Scrucca, 2021).

7. Open Problems and Frontier Directions

Despite considerable advances, several research directions remain:

Tighter lower bounds for $L_\infty$ coreset size and the exponent dependence on $j$ and $k$ in polynomial-size constructions (Tukan et al., 2022).
Improved optimization and coreset generation methods for accurate and scalable projective clustering, particularly under robust or composite loss functions (Krishnan et al., 2021).
Generalization to non-Euclidean geometries, function spaces, or structured data settings.
Efficient strategies for dynamically updating projective clustering solutions in streaming or online contexts without full recomputation.
Learning-theoretic questions such as the sharpness of generalization bounds and the role of log-factors, as highlighted by recent advances in distributional upper and lower bounds (Bucarelli et al., 2023).

Projective clustering remains a central and versatile primitive in geometric data analysis, offering a spectrum of algorithmic and statistical trade-offs for scalable, robust, and actionable modeling in high-dimensional spaces.