Likelihood Ratio Scoring via Block Rejection
- The paper introduces ε-regularization to stabilize the likelihood ratio statistic in sparse stochastic block models, ensuring valid hypothesis testing.
- It applies a Monte Carlo approximation to efficiently manage the sum over 2^n labelings, yielding asymptotic power-Poisson distributions with rigorous error control.
- The method effectively distinguishes Erdős–Rényi graphs from balanced community structures, outperforming traditional spectral and subgraph-count tests in high-SNR regimes.
Likelihood ratio scoring with block rejection refers to a rigorous statistical methodology for hypothesis testing in network data, specifically for distinguishing between an Erdős–Rényi (ER) random graph and a balanced two-community stochastic block model (SBM) in the bounded degree regime. The standard likelihood ratio (LR) approach degenerates in sparse settings due to the asymptotic orthogonality of the probability measures when the signal-to-noise ratio exceeds a certain threshold. To address this, an –regularization is introduced to stabilize the LR statistic, allowing for valid inference through block rejection rules. The resulting test yields asymptotic distributions characterized by power-Poisson laws, and achieves robust performance via Monte Carlo approximation, with strong theoretical and empirical guarantees in the high-SNR regime (Yuan et al., 2018).
1. Problem Formulation and Model Definitions
Consider the testing problem where an observed undirected graph on vertices is to be distinguished between:
- : (Erdős–Rényi, with )
- : (balanced bisection SBM), where each vertex is labeled independently, and
The “signal‐to‐noise ratio” (SNR) is defined as . When and are fixed (bounded degree regime), the classical LR statistic fails due to the lack of contiguity between the distributions. The problem forms the foundation for hypothesis testing in community detection and the determination of the number of communities.
2. Regularized Likelihood Ratio Statistic
An –regularization is introduced, modifying the intra-group and inter-group edge probabilities:
The regularized likelihood under the SBM model for a label assignment is
where if , if . The ER log-likelihood is , and the –regularized LR statistic is the average over all labelings:
Regularization ensures the ratio is strictly less than one, suppressing the explosive variance observed with the standard LR in the high-SNR bounded-degree setting.
3. Asymptotic Power-Poisson Laws
As and , the asymptotic distributions of are infinite “power-Poisson” products. For each cycle length :
- Under :
- Under (with block-signal parameter ):
These results rely on a Janson-type contiguity criterion and sufficient control over mixed moments of cycle counts and .
4. Rejection Rule and Error Rates
Given a desired significance level , compute the -quantile of under (). The block rejection rule:
- Reject if achieves
Power analysis reveals that for suitable regularization parameters and growing average degree, the test is asymptotically powerful whenever .
5. Monte Carlo Approximation and Computational Considerations
Direct evaluation of is infeasible since it requires summing over labelings. A Monte Carlo (MC) estimator using i.i.d. labelings achieves
with computational cost , where and is required for negligible MC error. This allows practical application for moderate system sizes, underlining the method's potential in sparse network regimes.
6. Empirical Studies and Performance Benchmarking
Simulations for values , , and correspond to SNR with . With appropriately chosen , the empirical size at level matches the theoretical prediction, and power increases with and . The regularized LR procedure outperforms the spectral test of Bickel–Sarkar and the subgraph-count test of Gao–Lafferty in these sparse regimes.
On real data, e.g., the “political books” co-purchase network (), the test shows high success rates (83%–100%) in correctly rejecting the null when distinct communities are merged, again exceeding the performance of competing methods in sparse graphs.
7. Extensions, Limitations, and Future Directions
The –regularization is critical for restoring statistical contiguity and mitigating the erratic behavior of the standard LR statistic in SBMs with bounded average degree. No test is possible for due to information-theoretic lower bounds. The MC approach's computational cost remains exponential in the worst case; further analysis of deterministic or mean-field approximations is needed. Extensions to multi-community SBMs, degree-corrected models, and connections with semidefinite and spectral relaxations are posited as promising avenues for future work. The likelihood ratio scoring with block rejection framework establishes a principled, theoretically robust paradigm for community hypothesis testing in sparse networks (Yuan et al., 2018).