Papers
Topics
Authors
Recent
2000 character limit reached

Likelihood Ratio Scoring via Block Rejection

Updated 21 December 2025
  • The paper introduces ε-regularization to stabilize the likelihood ratio statistic in sparse stochastic block models, ensuring valid hypothesis testing.
  • It applies a Monte Carlo approximation to efficiently manage the sum over 2^n labelings, yielding asymptotic power-Poisson distributions with rigorous error control.
  • The method effectively distinguishes Erdős–Rényi graphs from balanced community structures, outperforming traditional spectral and subgraph-count tests in high-SNR regimes.

Likelihood ratio scoring with block rejection refers to a rigorous statistical methodology for hypothesis testing in network data, specifically for distinguishing between an Erdős–Rényi (ER) random graph and a balanced two-community stochastic block model (SBM) in the bounded degree regime. The standard likelihood ratio (LR) approach degenerates in sparse settings due to the asymptotic orthogonality of the probability measures when the signal-to-noise ratio exceeds a certain threshold. To address this, an ε\varepsilon–regularization is introduced to stabilize the LR statistic, allowing for valid inference through block rejection rules. The resulting test yields asymptotic distributions characterized by power-Poisson laws, and achieves robust performance via Monte Carlo approximation, with strong theoretical and empirical guarantees in the high-SNR regime (Yuan et al., 2018).

1. Problem Formulation and Model Definitions

Consider the testing problem where an observed undirected graph GG on nn vertices is to be distinguished between:

  • H0H_0: GG(n,p0)G \sim G(n, p_0) (Erdős–Rényi, with p0=(a+b)/(2n)p_0=(a+b)/(2n))
  • H1H_1: GG(n,a,b)G \sim G(n, a, b) (balanced bisection SBM), where each vertex is labeled ou{±1}o_u \in \{\pm 1\} independently, and

Pr(Auv=1o)={a/n,ou=ov b/n,ouova>b>0\Pr(A_{uv}=1 \mid o) = \begin{cases} a/n, & o_u = o_v \ b/n, & o_u \neq o_v \end{cases} \quad a > b > 0

The “signal‐to‐noise ratio” (SNR) is defined as κ=(ab)22(a+b)\kappa = \frac{(a-b)^2}{2(a+b)}. When κ1\kappa \ge 1 and a,ba, b are fixed (bounded degree regime), the classical LR statistic fails due to the lack of contiguity between the distributions. The problem forms the foundation for hypothesis testing in community detection and the determination of the number of communities.

2. Regularized Likelihood Ratio Statistic

An ε\varepsilon–regularization is introduced, modifying the intra-group and inter-group edge probabilities:

aε=aε,bε=b+ε,0<ε<ab2,    (abε)22(a+b)<1a_\varepsilon = a - \varepsilon, \quad b_\varepsilon = b + \varepsilon, \quad 0 < \varepsilon < \frac{a-b}{2}, \;\; \frac{(a-b_\varepsilon)^2}{2(a+b)} < 1

The regularized likelihood under the SBM model for a label assignment oo is

1ε(Ao)=u<v{AuvlogPuvε(o)+(1Auv)log(1Puvε(o))}\ell_1^\varepsilon(A \mid o) = \sum_{u < v}\Big\{A_{uv} \log P^\varepsilon_{uv}(o) + (1 - A_{uv}) \log(1 - P^\varepsilon_{uv}(o))\Big\}

where Puvε(o)=aε/nP^\varepsilon_{uv}(o) = a_\varepsilon/n if ou=ovo_u = o_v, bε/nb_\varepsilon/n if ouovo_u \neq o_v. The ER log-likelihood is 0(A)\ell_0(A), and the ε\varepsilon–regularized LR statistic is the average over all labelings:

Ynε=12no{±1}nexp[1ε(Ao)0(A)]Y_n^\varepsilon = \frac{1}{2^n} \sum_{o \in \{\pm1\}^n} \exp\left[\ell_1^\varepsilon(A \mid o) - \ell_0(A)\right]

Regularization ensures the ratio (aεbε)/(a+b)(a_\varepsilon-b_\varepsilon)/(a+b) is strictly less than one, suppressing the explosive variance observed with the standard LR in the high-SNR bounded-degree setting.

3. Asymptotic Power-Poisson Laws

As nn \to \infty and κ>1\kappa > 1, the asymptotic distributions of YnεY_n^\varepsilon are infinite “power-Poisson” products. For each cycle length m3m \ge 3:

λmε=12m(aε+bε2)m,δmε=(aεbεa+b)m\lambda_m^\varepsilon = \frac{1}{2m}\left(\frac{a_\varepsilon + b_\varepsilon}{2}\right)^m, \quad \delta_m^\varepsilon = \left(\frac{a_\varepsilon-b_\varepsilon}{a+b}\right)^m

  • Under H0H_0:

YnεdWε=m=3(1+δmε)Zmexp(λmεδmε),Zmi.i.d.Poisson(λmε)Y_n^\varepsilon \xrightarrow{d} W_\varepsilon = \prod_{m=3}^\infty (1+\delta_m^\varepsilon)^{Z_m} \exp(-\lambda_m^\varepsilon \delta_m^\varepsilon), \quad Z_m \, \stackrel{\text{i.i.d.}}{\sim} \, \operatorname{Poisson}(\lambda_m^\varepsilon)

  • Under H1H_1 (with block-signal parameter δm=(aba+b)m\delta_m = (\frac{a-b}{a+b})^m):

YnεdW1=m=3(1+δmε)Z~mexp(λmεδmε),Z~mi.i.d.Poisson(λmε(1+δm))Y_n^\varepsilon \xrightarrow{d} W_1 = \prod_{m=3}^\infty (1+\delta_m^\varepsilon)^{\widetilde Z_m} \exp(-\lambda_m^\varepsilon \delta_m^\varepsilon), \quad \widetilde Z_m \, \stackrel{\text{i.i.d.}}{\sim} \, \operatorname{Poisson}(\lambda_m^\varepsilon(1+\delta_m))

These results rely on a Janson-type contiguity criterion and sufficient control over mixed moments of cycle counts and YnεY_n^\varepsilon.

4. Rejection Rule and Error Rates

Given a desired significance level 0<α<10 < \alpha < 1, compute the (1α)(1-\alpha)-quantile wαw_\alpha of WεW_\varepsilon under H0H_0 (P(Wεwα)=1α\mathbb{P}(W_\varepsilon \le w_\alpha) = 1-\alpha). The block rejection rule:

  • Reject H0H_0 if Ynε>wαY_n^\varepsilon > w_\alpha achieves

limnPH0(Ynε>wα)=α\lim_{n \to \infty} \mathbb{P}_{H_0}(Y_n^\varepsilon > w_\alpha) = \alpha

limnPH1(Ynεwα)=P(W1wα)\lim_{n \to \infty} \mathbb{P}_{H_1}(Y_n^\varepsilon \le w_\alpha) = \mathbb{P}(W_1 \le w_\alpha)

Power analysis reveals that for suitable regularization parameters and growing average degree, the test is asymptotically powerful whenever κ\kappa \to \infty.

5. Monte Carlo Approximation and Computational Considerations

Direct evaluation of YnεY_n^\varepsilon is infeasible since it requires summing over 2n2^n labelings. A Monte Carlo (MC) estimator using MM i.i.d. labelings achieves

Y^nε=1Mi=1Mu<v(Puvε(o(i))p0)Auv(1Puvε(o(i))1p0)1Auv\widehat{Y}_n^\varepsilon = \frac{1}{M} \sum_{i=1}^M \prod_{u < v} \left(\frac{P_{uv}^\varepsilon(o^{(i)})}{p_0}\right)^{A_{uv}} \left(\frac{1-P_{uv}^\varepsilon(o^{(i)})}{1-p_0}\right)^{1 - A_{uv}}

with computational cost O(Mn2)O(M n^2), where Mexp(2κεn)M \gg \exp(2 \kappa_\varepsilon n) and κε=(aεbε)2/(2(a+b))\kappa_\varepsilon = (a_\varepsilon - b_\varepsilon)^2 / (2(a+b)) is required for negligible MC error. This allows practical application for moderate system sizes, underlining the method's potential in sparse network regimes.

6. Empirical Studies and Performance Benchmarking

Simulations for values a=2.5+ca = 2.5 + c, b=2.5cb = 2.5 - c, and c{2.10,2.15,2.25,2.35}c \in \{2.10, 2.15, 2.25, 2.35\} correspond to SNR κ1.76,1.85,2.03,2.21\kappa \approx 1.76, 1.85, 2.03, 2.21 with n{20,30,40,45}n \in \{20,30,40,45\}. With appropriately chosen ε\varepsilon, the empirical size at level α=0.05\alpha=0.05 matches the theoretical prediction, and power increases with nn and κ\kappa. The regularized LR procedure outperforms the spectral test of Bickel–Sarkar and the subgraph-count test of Gao–Lafferty in these sparse regimes.

On real data, e.g., the “political books” co-purchase network (n=105n=105), the test shows high success rates (83%–100%) in correctly rejecting the null when distinct communities are merged, again exceeding the performance of competing methods in sparse graphs.

7. Extensions, Limitations, and Future Directions

The ε\varepsilon–regularization is critical for restoring statistical contiguity and mitigating the erratic behavior of the standard LR statistic in SBMs with bounded average degree. No test is possible for κ<1\kappa < 1 due to information-theoretic lower bounds. The MC approach's computational cost remains exponential in the worst case; further analysis of deterministic or mean-field approximations is needed. Extensions to multi-community SBMs, degree-corrected models, and connections with semidefinite and spectral relaxations are posited as promising avenues for future work. The likelihood ratio scoring with block rejection framework establishes a principled, theoretically robust paradigm for community hypothesis testing in sparse networks (Yuan et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Likelihood Ratio Scoring with Block Rejection.