Real Log Canonical Threshold
- Real Log Canonical Threshold is a birational and analytic invariant that quantifies the singular behavior of real-analytic functions and algebraic structures.
- It plays a pivotal role in singular learning theory by governing asymptotic behaviors of marginal likelihood and generalization error in Bayesian models.
- Computational methods such as blow-up algorithms, combinatorial approaches, and Monte Carlo estimators enable practical estimation of RLCT in complex models.
The real log canonical threshold (RLCT) is a birational and analytic invariant measuring the singularity of real-analytic and algebraic structures, and plays a central role in both real algebraic geometry and singular learning theory. In the context of statistics and machine learning, the RLCT (also called the learning coefficient) governs the leading behavior of marginal likelihood and generalization error in singular models, allowing for refined asymptotic model comparison well beyond classical regular cases.
1. Analytic and Geometric Definition
Given a real-analytic function , the RLCT, denoted , is defined via the analysis of the integrability of near its zero locus, or equivalently through resolution of singularities: where for a real log resolution , locally and for non-vanishing analytic functions , for (Kosta et al., 20 Nov 2024).
The multiplicity is defined as the maximal number of exponents achieving the minimum at any point. This pair fully characterizes the leading singular behavior of sharp analytic volume or zeta integrals around the singular locus: as for compact , analytic (Kosta et al., 20 Nov 2024).
2. RLCT in Complex and Real Settings
The RLCT is an extension of the log canonical threshold (lct) defined in the complex algebraic context, often calculated from complex resolutions of singularities. For , the complex lct is calculated via divisorial data from a complex log resolution : where are the discrepancy and multiplicity coefficients along exceptional divisors (Kosta et al., 20 Nov 2024).
When the log resolution is defined over and each exceptional divisor meets real points, , with the multiplicities also coinciding under mild conditions (Kosta et al., 20 Nov 2024).
3. RLCT in Singular Learning Theory
The RLCT, also called the learning coefficient, is the key invariant dictating the small-sample asymptotics for both Bayesian evidence and generalization error in singular or non-identifiable statistical models. For a statistical model and true data distribution , define the KL distance function: The RLCT is then determined by the leading pole (order ) of the zeta function (Imai, 2019, Hirose, 2023).
For data of size , the log marginal likelihood (free energy) asymptotically satisfies (Watanabe's Main Formula II): In regular cases ; for singular models, is typically smaller, yielding weaker Bayesian Occam penalties (Imai, 2019, Hirose, 2023).
4. Combinatorial Formulas: Hyperplane Arrangements
For , linear, (not necessarily reduced), set the arrangement, and let be its intersection poset (all proper nontrivial intersections ):
Explicitly:
If has real coefficients, and (Kosta et al., 20 Nov 2024).
Examples:
- In , for (), .
- For in , , (Kosta et al., 20 Nov 2024).
5. Algorithmic and Blow-up Approaches
Computation of RLCTs for general polynomial models hinges on real-analytic resolution of singularities. For sum-of-products (sop) polynomials, dedicated blow-up algorithms—iteratively replacing singular charts via coordinate blow-ups—yield normal or locally normal crossing forms (Hirose, 2023). For binomials (with ), critical poles can be computed explicitly: where , , , etc. For higher terms (), a linear programming simplex bound provides (Hirose, 2023).
For hyperplane arrangements, the combinatorial algorithm computes all intersection flats, their codimension and , and determines through inclusion poset chains (Kosta et al., 20 Nov 2024).
6. RLCT at Non-Singular Points and Upper Bounds
At non-singular points of the true parameter manifold in statistical models, RLCT is immediately computable. Under analytic assumptions and after a variable split , assumptions (feasibility, independence, vanishing order) yield
where effective parameter dimension, counts directions with quadratic expansion, and minimal order for higher directions (Kurumadani, 23 Aug 2024).
For global estimation in singular models, the RLCT is the minimum over non-singular points, with the above formula providing an effective upper bound, guiding practical Bayesian model comparison (Kurumadani, 23 Aug 2024, Yoshida et al., 2023).
7. Statistical Estimation and Model Selection
In practice, analytic RLCTs are known only in restricted cases; for broader models, Monte Carlo-based estimators exploit the relationship between the variance of log-likelihoods under tempered posteriors and the RLCT: for , and averaging over multiple simulated datasets increases accuracy. This enables the singular Bayesian information criterion (sBIC) to be widely deployed as WsBIC, using as the complexity penalty, which empirically outperforms WBIC/BIC in mixtures, reduced-rank, and other singular models (Imai, 2019).
Table: RLCT Calculation Methods and Their Domains
| Method | Domain/Applicability | Reference |
|---|---|---|
| Real log resolution | General analytic or algebraic | (Kosta et al., 20 Nov 2024) |
| Combinatorial poset (arrangements) | Hyperplane arrangements | (Kosta et al., 20 Nov 2024) |
| Blow-up algorithm (sop polynomials) | Sum-of-products, binomial polynomials | (Hirose, 2023) |
| Non-singular point expansion | Regular points in parametric models | (Kurumadani, 23 Aug 2024) |
| Monte Carlo (variance-based) | General statistical models | (Imai, 2019) |
The RLCT is thus a unifying invariant at the interface of algebraic geometry, singularity theory, and Bayesian statistics, encoding local analytic complexity and model selection criteria in both theoretical and computational practice. Calculating or estimating the RLCT, exactly or via bounds, is essential for the asymptotic characterization of evidence and generalization in nonregular models, including applications in high-dimensional arrangements, tensor decompositions, and non-identifiable latent-variable models (Kosta et al., 20 Nov 2024, Yoshida et al., 2023).