Papers
Topics
Authors
Recent
Search
2000 character limit reached

Blockwise & Grouped Patterns: Regression & Combinatorics

Updated 9 March 2026
  • Blockwise and grouped patterns are a framework combining structural sparsity in regression with symmetry in combinatorial patterns to enhance model interpretability.
  • They use hierarchical priors and smooth surrogate optimization to achieve simultaneous inter- and intra-group sparsity, reducing false positives in high-dimensional data.
  • Group actions in pattern theory enable tractable analysis of cyclic and symmetric structures, facilitating applications like wait-time analysis and nontransitive game strategies.

Blockwise and grouped patterns represent a foundational paradigm for structuring and analyzing both statistical models (notably in high-dimensional regression) and combinatorial objects (notably in the study of pattern-generating systems with symmetry). Two principal lines of research exemplify this paradigm: (1) the use of block-structured priors for grouped variable selection in regression models via nested spike-and-slab constructions, and (2) the study of patterns arising from group actions on words, particularly as applied to nontransitive games and wait-time analysis. These approaches exploit blockwise or group-level symmetries and constraints, often yielding substantial advantages in model selection, interpretability, and analytic tractability.

1. Blockwise Patterns in Grouped Variable Selection

In regression with grouped covariates, suppose pp features are partitioned into mm nonoverlapping groups G1,,GmG_1, \dots, G_m of cardinalities q1,,qmq_1, \dots, q_m, with regression coefficients βgRqg\beta_g \in \mathbb{R}^{q_g}. To model both between- and within-group sparsity, a nested spike-and-slab prior is introduced (Yen et al., 2011):

  • Group-level prior: For each group gg, a Bernoulli indicator γgBern(θg)\gamma_g \sim \mathrm{Bern}(\theta_g) determines whether the group is active. Marginally, βg\beta_g follows

f(βg)=θg(slab density)+(1θg)δ0(βg2),f(\beta_g) = \theta_g\, \text{(slab density)} + (1-\theta_g)\, \delta_0(\|\beta_g\|_2),

where, with probability 1θg1-\theta_g, the entire subvector βg\beta_g is exactly zero.

  • Within-group prior: Conditional on γg=1\gamma_g = 1, each coordinate jGgj \in G_g receives its own spike-and-slab prior via a Bernoulli αjγgBern(ωj)\alpha_j | \gamma_g \sim \mathrm{Bern}(\omega_j), i.e.,

βjγg,αjαjN(0,σ2/λ)+(1αj)δ(ξ,ξ)(βj),\beta_j | \gamma_g, \alpha_j \sim \alpha_j\, \mathcal{N}(0, \sigma^2/\lambda) + (1-\alpha_j)\, \delta_{(-\xi,\xi)}(\beta_j),

with ξ0\xi\to0.

This hierarchical construction can induce both exact block sparsity (entire groups zeroed out) and within-group sparsity (individual zeros within active blocks).

2. MAP Objective and Surrogate Optimization

The posterior mode estimation problem leads to an objective comprising both block- and coordinate-level penalties. For Gaussian regression yN(Xβ,σ2I)y\sim \mathcal{N}(X\beta, \sigma^2 I), the negative log-posterior (up to additive constants) is:

V(β)=12σ2yXβ22+λg=1mβg22+ρ1j=1pI{βj0}+ρ2g=1mqgI{βg20}.V(\beta) = \frac{1}{2\sigma^2}\|y - X\beta\|_2^2 + \lambda \sum_{g=1}^m \|\beta_g\|_2^2 + \rho_1 \sum_{j=1}^p \mathbb{I}\{\beta_j \neq 0\} + \rho_2 \sum_{g=1}^m \sqrt{q_g\, \mathbb{I}\{\|\beta_g\|_2 \neq 0\}}.

To render the problem tractable, each indicator is approximated by a smooth log-sum surrogate:

I{a0}gτ(a)=ln(1+a/τ)ln(1+τ1),τ0,\mathbb{I}\{a \neq 0\} \approx g_\tau(a) = \frac{\ln(1 + |a|/\tau)}{\ln(1+\tau^{-1})}, \qquad \tau \to 0,

which majorizes to weighted 1\ell_1 and group-2\ell_2 penalties. The resulting surrogate, at iterate β(d)\beta^{(d)}, is convex in β\beta:

Q(d)(β)=12σ2yXβ22+λgβg22+λ1jνj(d)βj+λ2gϕg(d)βg2,Q^{(d)}(\beta) = \frac{1}{2\sigma^2}\|y - X\beta\|_2^2 + \lambda \sum_g \|\beta_g\|_2^2 + \lambda_1 \sum_j \nu_j^{(d)} |\beta_j| + \lambda_2 \sum_g \phi_g^{(d)} \|\beta_g\|_2,

where νj(d)=(βj(d)+τ)1\nu_j^{(d)} = (|\beta_j^{(d)}| + \tau)^{-1} and ϕg(d)=(βg(d)2+τ)1\phi_g^{(d)} = (\|\beta_g^{(d)}\|_2 + \tau)^{-1}.

3. Blockwise Coordinate-Descent Algorithms

The surrogate objective facilitates minimization via blockwise coordinate descent. For each group gg:

  1. Zero-block test: The KKT subgradient at b=0b = 0 yields the criterion

STλ1νGg(2XgTrg)2λ2ϕg(d),\left\|\, \mathrm{ST}_{\lambda_1 \nu_{G_g}}\left(2X_g^T r_{-g}\right) \,\right\|_2 \leq \lambda_2\,\phi_g^{(d)},

where rg=yhgXhβhr_{-g} = y - \sum_{h\neq g} X_h \beta_h and STλv(z)j=sign(zj)max{zjλvj,0}\mathrm{ST}_{\lambda v}(z)_j = \mathrm{sign}(z_j)\max\{|z_j|-\lambda v_j, 0\}.

If true, set βg=0\beta_g = 0.

  1. Nonzero-block update: Otherwise, a strictly convex quadratic problem yields

βgnew=(XgTXg+wgI)1STλ1νGg/2(XgTrg),\beta_g^{\rm new} = \left(X_g^T X_g + w_g I \right)^{-1} \mathrm{ST}_{\lambda_1 \nu_{G_g}/2}\left( X_g^T r_{-g} \right),

with wg=λ+λ2ϕg(d)/(2βg(d)2)w_g = \lambda + \lambda_2 \phi_g^{(d)} / (2\|\beta_g^{(d)}\|_2).

This two-stage procedure ensures exact block zeros and soft-thresholded updates within active blocks. Majorization-minimization is iterated until convergence.

4. Theoretical Guarantees and Label-Invariance

Under standard regularity on the design matrix XX (e.g., restricted eigenvalue conditions) and appropriate growth rates for λ,ρ1,ρ2=O(n)\lambda, \rho_1, \rho_2 = O(\sqrt{n}), key properties can be established (Yen et al., 2011):

  • Estimation error bound: If the true support lies in rr blocks covering qRq_R coordinates,

β^β2=O(1nqRlnm)\| \hat{\beta} - \beta^* \|_2 = O\left( \frac{1}{\sqrt{n}} \sqrt{ q_R \ln m } \right)

with high probability. When qgpq_g \ll p and rmr \ll m, this can improve upon the lasso rate O(slnp/n)O(\sqrt{s \ln p / n}).

  • Label-invariance: Provided ρ2maxgqg=o(lnn)\rho_2 \max_g \sqrt{q_g} = o(\ln n), the estimator becomes asymptotically invariant to the choice of grouping as τ0\tau \to 0.
  • Sign-consistency: Under Gaussian errors and p=o(n/ln2n)p = o(n/\ln^2 n), with no irrepresentable-type condition, Pr[sign(β^)=sign(β)]1\Pr[\,\mathrm{sign}(\hat{\beta}) = \mathrm{sign}(\beta^*)\,] \to 1.

These results indicate that block-structured priors can induce simultaneous inter- and intra-group sparsity with favorable finite-sample and asymptotic guarantees.

5. Pattern Formation by Group Action: Blockwise Reductions

A parallel formalism emerges in the combinatorics of patterns under group action (Khovanova et al., 2020). Let A\mathcal{A} be an alphabet of size qq, and GSqG \subset S_q a group acting on A\mathcal{A} by permuting letters, which extends letterwise to words w=w1wAw = w_1\cdots w_\ell \in \mathcal{A}^\ell: gw=(gw1)(gw)g \cdot w = (g\cdot w_1)\cdots (g\cdot w_\ell).

  • The orbit Gw={gw:gG}G \cdot w = \{g \cdot w : g \in G\} and its stabilizer Gw={gG:gw=w}G_w = \{g \in G : g \cdot w = w\}.
  • The set of patterns of length \ell is identified with the set of orbits.

When GG factors as a product or acts on blocks, there is often a bijection between patterns of length \ell and words of reduced length. The cyclic group CqC_q acting on Z/qZZ/qZ under Caesar shift exemplifies this principle: every pattern of length \ell is determined by its adjacency signature S(p)(Z/qZ)1S(p) \in (Z/qZ)^{\ell-1}. Thus, analysis of avoidance, generating functions, and waiting times for blockwise group patterns can be reduced to lower-dimensional classical problems.

6. Statistical and Combinatorial Consequences

Blockwise and grouped structures in both regression and pattern theory enforce structural constraints that shape model selection and pattern occurrence statistics:

  • In regression, simulation [(Yen et al., 2011), Table 1] shows that the grouped variable selection via nested spike-and-slab (gvsnss) outperforms lasso and group lasso when support lies within a few groups, particularly when needing to detect within-group zeros. Specifically, in a scenario with p=100p=100, m=10m=10, r=2r=2 active groups, and five within-group nonzeros per active group, gvsnss yields lower false positive rates (7.8%) and L2L_2 error (0.95), with 68% correct detection of within-group zeros, compared to higher false positive rates and lower within-group specificity for standard lasso and group lasso.
  • In combinatorial pattern matching, blockwise group actions enable explicit calculation of pattern-based Conway leading numbers, expected wait times, and non-transitive game strategies, especially under cyclic and symmetric group action (see, e.g., Section 8 and 9 of (Khovanova et al., 2020)).
Method FPR (%) β^β2\|\widehat\beta-\beta^*\|_2 Within-group zero detections
lasso 24.5 1.04 0.21
group lasso 9.2 1.07 0.00
gvsnss 7.8 0.95 0.68

A plausible implication is that methodologies exploiting blockwise or grouped patterns, whether via hierarchical priors or group actions, support refined inference and analytic tractability in structured high-dimensional or symmetric settings.

7. Synthesis and Broader Implications

Blockwise and grouped patterns, manifested as either hierarchical priors in regression or as group actions partitioning word spaces, provide a unifying abstraction for imposing and exploiting structural constraints. In both settings, algorithms and theoretical results leverage block structure to improve selection specificity, estimation accuracy, and enable tractable computation or exact enumeration. These principles, demonstrated respectively by the gvsnss estimator for regression and group-action pattern theory in combinatorics, suggest broad applicability for model selection, symmetry exploitation, and the design of algorithms that require discrimination at multiple hierarchical or group levels (Yen et al., 2011, Khovanova et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Blockwise and Grouped Patterns.