Rank-Constrained MLE

Updated 11 November 2025

Rank-Constrained Maximum-Likelihood Estimator is a method that estimates parameters over matrices with a low-rank structure to enhance efficiency in various statistical models.
It reformulates the feasible set via algebraic and geometric constraints, leading to nonconvex optimization challenges in applications like factor analysis and Markov chains.
Practical computation leverages methods such as DC programming, homotopy continuation, and alternating optimization to overcome nonconvexity and identifiability issues.

A rank-constrained maximum-likelihood estimator (RC-MLE) is a solution to a maximum-likelihood estimation problem over the space of matrices (or higher-order tensors) whose rank does not exceed a prescribed value. This estimator arises naturally in a variety of statistical models where the parameter of interest is suspected or known to have low rank: joint probability tables, covariance matrices in factor analysis, transition matrices in Markov chains, network matrices in dynamic models, and coefficient matrices in multivariate GLMs. Imposing a rank constraint induces a nonconvex (typically determinantal or nonnegative-factorization) feasible set, challenging both optimization and statistical analysis. The RC-MLE problem therefore sits at the intersection of algebraic geometry, convex analysis, and high-dimensional statistical theory.

1. General Formulation of the Rank-Constrained MLE

Given data $D$ and a family of probability models indexed by a matrix parameter $P$ (or tensor), the unconstrained MLE solves

$\hat P = \arg\max_{P \in \mathcal{S}} \mathcal{L}(P; D)$

where $\mathcal{S}$ is the relevant ambient parameter set (e.g., nonnegative stochastic matrices, positive-definite matrices, regression coefficients) and $\mathcal{L}$ is the log-likelihood. The RC-MLE augments this with the restriction

$\rank(P) \le r$

yielding the feasible set $\mathcal{S}_r = \{P \in \mathcal{S}: \rank(P) \le r\}$. Typical choices include:

Discrete distributions: $\mathcal{S}_r$ is the intersection of the probability simplex and a determinantal variety $\mathcal{V}_r$ , given by the vanishing of all $(r+1)\times (r+1)$ minors (Hauenstein et al., 2012).
Factor analysis: The covariance matrix is parameterized as $\Sigma = LL^\top + \Psi$ with $\rank(L) \le r$ (Khamaru et al., 2018).
Markov chains: The transition matrix $P \in \R^{p\times p}$ is row-stochastic and $\rank(P)\le r$ (Li et al., 2018).
Nonnegative matrix factorization (NMF): The nonnegative rank constraint $\rank_+(A) = r$ is enforced via $A = BC^\top$ , $B,C \ge 0$ (Gourieroux et al., 2022).
Generalized linear models: The regression coefficient matrix $B$ is constrained to have $\rank(B)\le r$ (Bura et al., 2017).

In all cases, the problem is nonconvex due to the rank constraint.

2. Algebraic and Geometric Formulation

The rank constraint is encoded by determinantal equations: $\mathcal{V}_r = \Big\{P \in \mathbb{P}^{mn-1}: \det(P_{I\times J})=0,\ \forall I,J,\ |I|=|J|=r+1 \Big\}$ where each $(r+1)\times(r+1)$ minor vanishes. For probability tables, the full parameter space becomes the intersection $\mathcal{M}_r = \mathcal{V}_r \cap \Delta_{mn-1}$ , the determinantal variety inside the simplex (Hauenstein et al., 2012).

For nonnegative rank or structured cases (e.g., network matrices), the feasible set is not algebraic in the strict sense but comprises the image of a polynomial map with nonnegativity constraints, i.e., all $A=BC^\top$ with $B, C\ge 0$ , $\rank(BC^\top)=r$ (Gourieroux et al., 2022).

The space $\mathcal{V}_r$ is typically a nonconvex, real-algebraic manifold embedded in a high-dimensional ambient space, which makes classical convexity-based M-estimation theory inapplicable.

3. Optimization and Computational Approaches

RC-MLEs are solutions to nonconvex, typically smooth or piecewise smooth, optimization problems with algebraic constraints. Several computational strategies have emerged across application domains:

Lagrangian critical-point equations:

For joint probability tables under a multinomial model, Lagrange multipliers for both the normalization and determinantal constraints lead to a system of polynomial equations for stationary points, e.g.,

$\frac{u_{ij}}{p_{ij}} + \gamma + \sum_{|I|=|J|=r+1}\lambda_{I,J} \frac{\partial}{\partial p_{ij}} \det(P_{I\times J}) = 0$

with normalization and vanishing minors (Hauenstein et al., 2012).

Local kernel and projection formulations:

Matrix equations such as $LP=0$ , $PR=0$ enforce the rank constraint via auxiliary variables and transform the criticality system into a square system of bilinear/quartic equations, supporting numerical algebraic geometry techniques (Hauenstein et al., 2012).

Difference-of-convex (DC) programming:

For Markov chains and factor analysis, a DC decomposition is achieved via spectral characterizations: the constraint $\rank(P)\le r$ is equivalent to requiring the nuclear norm minus the Ky Fan $r$ -norm (sum of the top $r$ singular values) to vanish. Penalized or constrained DC algorithms iteratively solve convexified subproblems:

$F(P) = -\ell(P) + \gamma(\|P\|_* - \|P\|_{(r)})$

Each iteration involves subgradient calculation and convex minimization with respect to the DC surrogate (Li et al., 2018, Khamaru et al., 2018).

Alternating optimization (e.g., for NMF):

Alternately maximize the likelihood over each factor ( $B$ , $C$ ) while preserving nonnegativity, with an additional identification step to select canonical representatives under non-uniqueness (Gourieroux et al., 2022).

Numerical algebraic geometry:

Homotopy continuation, polyhedral methods, and regeneration enable computation of all complex critical points (and thus all local maxima/minima) for generic data (Hauenstein et al., 2012).

Alternating projections or EM for specific models:

Alternating least squares and expectation-maximization are often used in applied settings, though they do not guarantee identification of all maxima due to inherent nonconvexity.

4. Statistical and Algebraic Properties

Maximum Likelihood Degree

The RC-MLE problem is algebraically rich: the number of complex critical points (the ML degree) for generic data quantifies the fundamental algebraic complexity of the problem (Hauenstein et al., 2012). For instance, for joint probability tables:

$\begin{array}{c|cccc} r\backslash(m,n) & (3,3)&(3,5)&(4,4)&(4,5)\ \hline 1&1&1&1&1\ 2&10&58&191&843\ 3&1&1&191&843 \end{array}$

Pragmatically, this enumerates the number of stationary points; only some will be real and positive. The global maximizer among them corresponds to the RC-MLE.

Statistical Guarantees

In many contexts, the RC-MLE enjoys statistical guarantees superior to the unconstrained MLE, especially when the true matrix is low-rank and the sample size is limited. Example rates (Li et al., 2018): $D_{KL}(P^*\Vert\widehat P) \le C\,\frac{(p+r)r \log(p/\delta)}{n}$ when $p$ is the state space size, $r$ the hypothesized rank, and $n$ the sample size. This matches minimax lower bounds under the rank restriction and substantially improves upon the unconstrained MLE rate for small $r$ .

For GLMs, restricting to rank- $r$ matrices reduces the effective number of parameters and tightens asymptotic efficiency: the asymptotic covariance of the RC-MLE is a projection of the full-model Fisher inverse onto the tangent space of rank- $r$ matrices (Bura et al., 2017). Thus, imposing the rank constraint yields nontrivial efficiency gains under correct specification.

Maximum Likelihood Duality

For determinantal varieties, an ML duality result holds: the ML degrees for $\mathcal{V}_r$ and its complement $\mathcal{V}_{m-r+1}$ coincide, with an explicit bijection (the Hadamard product with a data-dependent sufficient-statistics matrix) between their critical points (Hauenstein et al., 2012). This duality is useful computationally and conceptually, as it can reduce computational cost by allowing the problem to be solved for the smaller of the two ranks.

5. Algorithms and Practical Computation

The RC-MLE problem is nonconvex and, depending on the ambient space and size, can be computationally intractable for standard symbolic or direct techniques. The following algorithmic frameworks are prominent:

Model	Formulation	Main Algorithmic Approach
Discrete Fits	Polynomial constraints	Numerical algebraic geometry (homotopy, etc.)
Factor Analysis	DC program in diagonals	Convex-concave procedure, closed-form updates
Markov Chains	Nuclear/Ky Fan spectral DC	Proximal-gradient DC algorithm
NMF, Networks	Nonnegative factorization	Alternating ML + identification (for NMF)
GLMs	Manifold parameterization	Alternating GLM fits, projective Newton, SVD

For moderate problem sizes, homotopy continuation yields all stationary points; for larger settings, projection, DC, and alternating minimization techniques are combined with subgradient and eigenvalue computations. In DC methods for factor models, each iteration requires at worst a low-rank partial SVD, and empirical runtimes for high-dimensional covariance estimation indicate substantial scalability (Khamaru et al., 2018).

The alternation/identification method for NMF models ensures uniqueness within the equivalence class defined by factorization ambiguity by optimizing a secondary criterion at each iteration (Gourieroux et al., 2022).

6. Applications and Consequences

The rank-constrained MLE appears in:

Discrete mixture modeling: Learning joint distributions of categorical variables with latent low-rank structure.
Structure learning in dynamic networks: Inferring network (contagion) matrices with limited nonnegative rank.
Low-rank Markov dynamics: Estimating transition kernels for large state spaces with latent structure, accelerating sample efficiency (Li et al., 2018).
High-dimensional factor analysis: Estimating covariance decompositions in high-dimensional settings, allowing for efficient model selection and structure adaptation (Khamaru et al., 2018).
Multivariate reduced-rank GLMs: Dimensionality reduction in regression for multivariate responses (Bura et al., 2017).
Sparse tensor estimation: Characterization of phase transitions in signal recovery and all-or-nothing regimes for rank-1 tensors (Corinzia et al., 2021).

A direct consequence of the RC-MLE approach is the ability to find all critical points (not just a local optimum, as in EM) and to select the global maximizer. This is unattainable for non-algebraic or EM-based algorithms. ML duality can further simplify algebraic complexity and computation in determinantal settings.

7. Theoretical Challenges and Open Directions

The RC-MLE problem typifies the interplay between algebraic geometry, nonconvex optimization, and statistical theory. Several challenges persist:

Nonconvexity: While DC and manifold parametrization enable tractable local optimization, global optimality is hard to certify except in small to moderate dimensions.
Identifiability: Nonuniqueness of matrix factorizations (e.g., in NMF) and identifiability under constraints require secondary optimization criteria (e.g., maximizing entropy or volume in NMF) to select canonical solutions (Gourieroux et al., 2022).
Algebraic boundaries: The constraint sets are often neither closed nor convex, especially for exact-rank constraints, complicating M-estimation theory. Nonetheless, existence, consistency, and asymptotic normality have been established for important cases using local parametrization and tangent space techniques (Bura et al., 2017).
Statistical-computational tradeoffs: In sparse tensor and other high-dimensional inference problems, the RC-MLE exhibits sharp phase transitions in recoverability, matching fundamental information-theoretic limits (Corinzia et al., 2021).

A plausible implication is that continued advances in numerical algebraic geometry and DC optimization, coupled with tailored local-parametrization methods, will further expand the practical and theoretical reach of the rank-constrained MLE paradigm.