Pattern-Based Kernel Search (PaKS)

Updated 30 January 2026

PaKS is a methodology that uses statistical pattern recognition and compositional kernel grammars to guide structure learning in regression and optimization.
It incorporates greedy beam search, variational bounds, and region-guided heuristics to efficiently select interpretable kernels and improve model accuracy.
The approach enhances scalability and robustness in applications like time series forecasting, nonparametric regression, and combinatorial optimization such as SSCFLP.

Pattern-Based Kernel Search (PaKS) encompasses a set of approaches employing pattern discovery and kernel-based modeling for structure learning, optimization, and efficient inference in diverse domains, including time series analysis, nonparametric regression, and large-scale combinatorial optimization.

1. Formal Definition and Conceptual Foundations

Pattern-Based Kernel Search (PaKS) refers to methodologies that leverage statistical pattern recognition or compositional kernel constructions to guide kernel selection, structure learning, or heuristic optimization. PaKS methodologies are unified by the use of “patterns”—whether in time series, feature matrices, or symbolic kernel grammars—to inform the construction or restriction of the search space for kernel functions, variables, or assignments.

In nonparametric regression and Gaussian process modeling, PaKS formalizes the kernel search space via a compositional grammar generated by repeated addition and multiplication of a set of base kernels: Squared Exponential (SE), Periodic (Per), Linear (Lin), and Rational Quadratic (RQ). Valid kernels are represented as symbolic expressions built from these primitives, allowing for automatic discovery of interpretable covariance structures (Duvenaud et al., 2013, Kim et al., 2017).

In combinatorial optimization, notably in the Single-Source Capacitated Facility Location Problem (SSCFLP), PaKS first identifies statistically coherent regions via pattern recognition (e.g., spectral biclustering of LP relaxations), then restricts kernel search heuristics to variables associated with these regions, yielding accelerated and higher-quality optimization (Bakker et al., 9 Dec 2025).

2. Compositional Kernel Search in Regression

The compositional PaKS approach defines the kernel search space as:

$K \rightarrow B$ (base kernel)
$K \rightarrow K + K$
$K \rightarrow K \times K$

A Gaussian process prior with covariance $K_\theta(X, X)$ is fit to data $(X, y)$ , optimizing the log marginal likelihood

$\ell(\theta) = -\frac{1}{2} y^\top [K_\theta + \sigma_n^2 I]^{-1} y - \frac{1}{2} \log|K_\theta + \sigma_n^2 I| - \frac{n}{2} \log 2\pi$

Parameter learning is performed via conjugate-gradient optimization with random restarts. To select among kernels, the Bayesian Information Criterion (BIC) is used: $\mathrm{BIC}(K) \approx -2\,\ell(\hat{\theta}) + d \log n$ where $d$ is the number of hyperparameters (Duvenaud et al., 2013).

The greedy structure search expands top-scoring kernels via sum/product operations over subexpressions, maintaining a beam of candidates with early stopping and parameter inheritance to improve tractability. The resulting composite kernels enable interpretable posterior decomposition of functions into additive and multiplicative components.

3. Scalable Structure Discovery via Bounds

For large datasets ( $N \sim 10^4 - 10^5$ ), PaKS uses scalable kernel composition (SKC), replacing cubic-time GP marginal likelihood computations with sandwich bounds:

Variational Lower Bound ( $\mathcal{L}_{VB}$ ), using Nyström approximation ( $m \ll N$ ):

$\mathcal{L}_{VB}(\theta, Z) = \log N(y;0, \hat{K} + \sigma^2 I) - \frac{1}{2\sigma^2} \mathrm{Tr}[K - \hat{K}]$

Upper Bound ( $\mathcal{U}$ ), via log-determinant and quadratic-form relaxation:

$\mathcal{U}(\theta; Z, \alpha) = -\frac{1}{2} \log \det[\hat{K} + \sigma^2 I] + \frac{1}{2} \alpha^\top (K+\sigma^2 I)\alpha - \alpha^\top y - \frac{N}{2}\log 2\pi$

These bounds define an interval

$\mathcal{L}_{VB}(\theta) \leq \log p(y|X, \theta) \leq \mathcal{U}(\theta)$

enabling beam search over kernel grammar candidates by pruning those whose upper bounds do not overlap the best running interval. This reduces per-candidate search complexity to $O(N^2)$ , allowing parallel evaluation and practical model selection for high-dimensional regression tasks (Kim et al., 2017).

4. Pattern Recognition and Region-Guided KS in Combinatorial Optimization

In the context of SSCFLP, PaKS executes two key phases (Bakker et al., 9 Dec 2025):

Phase 1: Solve LP relaxations and extract pattern-based feature matrix $A$ , counting how frequently customers are fractionally (or integrally) served by facilities across perturbed LP solutions.

$a_{ij} = \sum_{s=1}^{N+1} \lceil x_{ij}^s \rceil$

Spectral biclustering of $A$ yields regions $\mathcal{R} = \{(I_r, J_r)\}$ , partitioning facilities and customers to minimize inter-region spillover $\ell^{inter}$ .

Phase 2: Enhanced KS heuristic constructs a kernel $K = K_y \cup K_x$ of promising binary variables, indexed by region-based facility scores and intra-/gray-zone assignment variables using reduced cost thresholds. Optimization proceeds via a sequence of restricted BIPs, learn-and-adjust iterations, and bucket expansions, focusing computational effort on core region-associated variables and significantly reducing subproblem sizes.

5. Computational Outcomes and Empirical Evaluation

PaKS methodologies demonstrate superior empirical performance in both regression structure discovery and optimization contexts.

GP Kernel Structure Recovery

Synthetic tests recover planted kernel expressions under moderate noise. Interpretable decompositions on time-series yield accurate forecast and extrapolation (Mauna Loa CO₂, airline passengers, solar irradiance) (Duvenaud et al., 2013).
SKC enables kernel discovery at unprecedented scale, maintaining interpretability and predictive accuracy as data size grows (Kim et al., 2017).

SSCFLP Optimization Results

Benchmark Instances ( $m, n \leq 1000$ )

Method	#Best found	Avg. Gap (%)	Avg. CPU Time (s)
PaKS	90	0.02	1,858
KS $^\text{2014}$	84	0.03	1,834
CPLEX	74	0.05	13,340

Very Large Instances ( $m \leq 2,000, n \leq 4,400$ )

Method	#Best found	Avg. UB-Gap (%)	Avg. LB-Gap (%)	Avg. CPU Time (s)
PaKS	254	1.32	3.29	5,050
KS $^\text{2014}$	209	2.20	4.09	4,959
CPLEX	152	3.76	5.83	7,168

PaKS achieves more best solutions, lower average optimality gaps, and competitive CPU times compared to standard KS and CPLEX. The computational overhead of Phase 1 is consistently amortized by reduced size and improved quality of the kernel in Phase 2 (Bakker et al., 9 Dec 2025).

6. Key Properties, Tradeoffs, and Implementation Considerations

Interpretability: PaKS compositional grammar yields human-readable kernel structures, facilitating scientific insight into latent patterns (e.g., seasonal, local, and global trends).
Scalability: Sandwich bounds, beam/semi-greedy search, and region-guided kernel restriction render PaKS computationally efficient for large datasets and high-dimensional combinatorial models.
Parallelism: DTW-based embeddings, kernel evaluations, parameter optimization, and search expansions are parallelizable, enhancing throughput and runtime efficiency (Candelieri et al., 2019, Kim et al., 2017).
Parameterization: Key parameters include number of random basis series, kernel grammar depth, buffer/beam size, region thresholds, and LP/bucket initialization weights. Hyperparameters for RBF kernels and thresholds are typically tuned via cross-validation or empirical approximation to underlying distances.
Robustness: PaKS is robust to noise, recovers true patterns in high-SNR regimes, and exhibits gradual complexity adjustment in noisier settings.

A plausible implication is that PaKS-style region-based restriction and compositional kernel grammar can be generalized to other MIP-based combinatorial problems and time-series domains, subject to proper parameterization and feature extraction.

7. Domain-Specific Examples and Extensions

Time Series Subsequence Search: Embedding-based PaKS implements fixed-dimensional DTW vector mapping and RBF kernel construction for efficient Bayesian-optimized search, with substantial speedups and minimal loss in accuracy for pattern identification tasks (e.g., user identification from walking data) (Candelieri et al., 2019).
SSCFLP Region-Guided Optimization: LP- and bicluster-driven pattern extraction guides kernel and bucket selection, yielding higher-quality, scalable facility location solutions compared to heuristic and MIP solvers (Bakker et al., 9 Dec 2025).
Nonparametric Regression: PaKS enables structure recovery, long-range extrapolation, and additive decomposition of GP models, outperforming classical kernels/combinations in test MSE and likelihood on standard benchmarks (Duvenaud et al., 2013).
Scalable Kernel Model Selection: Sandwiching the GP marginal likelihood by variational and Nyström+CG-based bounds allows tractable, interpretable kernel search at modern data scale, preserving Automatic Statistician model semantics (Kim et al., 2017).

Applications and future investigations may extend PaKS to alternative pattern extraction (e.g., hierarchical or density clustering), dynamic/multi-period optimization, and broader classes of MIP-based models.

PaKS thus provides a unified pattern-driven framework for guided kernel search, balancing interpretability, scalability, and empirical solution quality across regression and combinatorial optimization tasks.