PaKS: Pattern-Based Kernel Search

Updated 11 December 2025

PaKS is a framework that exploits repeated patterns to enhance scalability, interpretability, and efficiency across combinatorial optimization and nonparametric regression.
It employs techniques like spectral biclustering and a greedy kernel grammar search to identify interdependencies and decomposable structures in data.
Empirical results show that PaKS reduces variable search space by up to 61% and achieves near-optimal gaps while scaling to large problems with thousands of variables.

Pattern-based Kernel Search (PaKS) refers to a family of approaches that systematically exploit repeated patterns—either in combinatorial optimization variables or function/data structure—to accelerate and enhance kernel-based modeling or search. Across optimization, machine learning, and symbolic regression domains, PaKS enables the automated discovery or exploitation of compositional structure, yielding improved scalability, interpretability, or computational efficiency. The concept is instantiated in multiple, field-specific algorithmic frameworks, including matheuristics for large-scale integer programming, compositional kernel learning for nonparametric regression, and sequence-embedding for time-series search.

1. Foundational Concepts

PaKS frameworks center on identifying, leveraging, or searching over structured patterns within a larger search or model space. Patterns may be:

Combinatorial variable dependencies: Blocks of variables in facility location or assignment problems that tend to act in concert due to spatial or cost structure.
Kernel compositions: Sums and products of base covariance kernels (e.g., SE, PER, LIN, RQ) encoding interpretable patterns such as locality, periodicity, and linear trends in regression.
Feature embeddings: Low-dimensional representations of sequence similarity, as for efficient approximate matching under dynamic time warping.

In all cases, the search is "pattern-based" in the sense that candidate solutions, model components, or search neighborhoods are generated, enhanced, or pruned based on detected pattern structure.

2. PaKS in Large-Scale Facility Location

In combinatorial optimization, notably the Single-Source Capacitated Facility Location Problem (SSCFLP), PaKS is instantiated as a two-phase matheuristic designed to scale to instances involving thousands of facilities and customers (Bakker et al., 9 Dec 2025):

Phase 1: Pattern Recognition via Spectral Biclustering A set of perturbed LP relaxations is solved to produce a fractional assignment matrix. Spectral biclustering is applied to this matrix, producing regions—subsets of locations and customers whose assignment and opening variables are strongly interdependent.
Phase 2: Enhanced Kernel Search The kernel and buckets for the kernel search are constructed to align with these regions. Initial variable scores are synthesized from the LP solutions, informing which binary variables are most promising. The iterative "learn-and-adjust" kernel search is then performed on increasingly enlarged subproblems defined by these pattern-derived blocks.

This approach yields drastically smaller subproblems: on large benchmarks, PaKS discards on average 61% of y-variables up front and produces y/x-buckets that are 46%/53% smaller, respectively, than standard KS (Bakker et al., 9 Dec 2025). The result is improved solution quality and robustness, with empirically demonstrated smallest average gaps (0.02%–1.32%) and highest counts of best solutions among tested heuristics and solvers on test instances up to 2,000 × 2,000.

3. PaKS for Kernel Structure Learning in Nonparametric Regression

PaKS is extensively developed in structure discovery for Gaussian Process (GP) regression (Duvenaud et al., 2013). Here, "pattern-based" refers to compositional kernel search over an explicit grammar:

Kernel Grammar:

Sums and products of one-dimensional base kernels—Squared Exponential (SE), Periodic (Per), Linear (Lin), and Rational Quadratic (RQ)—across input dimensions. The search space is defined by recursively applying addition and multiplication.

Greedy Search Algorithm:

At each iteration, the current best kernel is expanded by local operators: addition, multiplication, or substitution of base kernels at any subexpression, spanning all input dimensions.

Model Selection:

Each candidate kernel is scored by maximizing the GP log-marginal likelihood, regularized by the Bayesian Information Criterion (BIC) penalty for model complexity.

Empirical results on time-series extrapolation and UCI regression tasks demonstrate that PaKS discovers interpretable, additive decompositions (e.g., trend + periodicity + noise) and frequently outperforms standard GP kernels or kernel combination methods. A key feature is the interpretability of learned models: each term in the composite kernel admits probabilistic functional decomposition, making latent structure explicit (Duvenaud et al., 2013).

4. Scalability: Approximations and Accelerations

The original kernel search algorithm in the Automatic Statistician is limited by the cubic cost per candidate ( $O(N^3)$ GP inference). Scalable variants, such as Scalable Kernel Composition (SKC), retain the pattern-based grammar but replace expensive likelihood evaluation with tight variational lower and novel upper bounds, each computable in $O(N^2)$ per candidate (Kim et al., 2017):

Variational Inducing-Point Lower Bound (VAR–LB):

Employs a Nyström approximation with a small number of inducing points for efficient bound computation and hyperparameter optimization.

Nyström-Conjugate Gradient Upper Bound (UB):

Uses matrix inequalities and numerical linear algebra to bound the GP log-evidence from above, providing a sandwich around the true marginal likelihood.

During search, candidates whose upper bound falls strictly below the best lower bound so far are pruned, and the most promising candidates are expanded. This enables pattern-based kernel model search to scale to tens of thousands of points—orders of magnitude faster than naive approaches—while preserving interpretability and grammar completeness.

5. Pattern-based Kernels for Subsequence Search

PaKS is also applied to efficient time-series and sequence retrieval problems (Candelieri et al., 2019):

Random Feature Embedding:

A set of short "basis" time series is generated; each query or candidate subsequence is embedded into a fixed-dimensional space by computing its DTW distance to each basis pattern.

RBF Kernel in Embedding Space:

The similarity between two sequences is defined by a Gaussian kernel on their embedded feature vectors, approximating the DTW distance.

Bayesian Optimization for Subsequence Search:

Rather than exhaustively searching all possible subintervals, Bayesian optimization is applied in the discrete interval domain, where each evaluation is a kernel (not direct DTW) distance computation.

While this leads to a small drop in accuracy (77% vs. 86% correct identifications compared to classic DTW), it results in substantial computational savings, especially when the stream length far exceeds the probe pattern length and the number of basis series is moderate. The computation is embarrassingly parallel across basis patterns (Candelieri et al., 2019).

6. Generalizations, Limitations, and Tuning Principles

Generalization Potential

The two-phase pattern-recognition plus kernel-search paradigm in combinatorial optimization is generalizable to settings where:

Fractional or relaxed problems reveal high-level variable interdependencies.
Biclustering or spectral detection can identify block structure.
A subproblem-based search strategy permits dynamic inclusion/exclusion of variable blocks.

Notably, extensions to multi-commodity network design, generalized assignment, and other multi-block problems are plausible (Bakker et al., 9 Dec 2025).

Tuning and Practical Considerations

Across the PaKS variants, key hyperparameters govern coverage, compositionality, and computational efficiency:

Facility location PaKS: Number of LP relaxations (N), cap on variable-interaction cuts, biclustering leakage threshold (θ), and kernel/bucket composition rules.
Kernel structure learning: Search depth, set of base kernels, and operators affect both coverage and computational load; hyperparameter optimization and warm-start heuristics improve convergence.
Time-series PaKS: Number/length of basis series (R, L_min, L_max), Bayesian optimization iteration count, and kernel scale (ℓ) directly influence embedding fidelity and search runtime.

Common limitations include reliance on observed (executed) paths for pattern detection, single-threaded assumption in program trace analysis, and the necessity of appropriate frontend infrastructure (e.g., LLVM for program traces).

7. Impact and Comparative Empirical Performance

Empirical benchmarks consistently show that Pattern-based Kernel Search yields gains in scalability and/or interpretability over standard baselines:

SSCFLP: On over 100 large-scale instances, PaKS achieved an average optimality gap of 0.02%, outperforming standard KS and heuristic or exact solvers (CPLEX), and was the only approach to scale robustly to 2,000 × 2,000 variable grids (Bakker et al., 9 Dec 2025).
Compositional GP regression: PaKS systematically matches or surpasses SE-ARD, additive GPs, and prior structure learning approaches in test error and negative log-predictive density, while decomposing target functions into meaningful patterns (Duvenaud et al., 2013).
Subsequence search: The PaKS approach delivers nearly optimal accuracy with order-of-magnitude computational savings versus full DTW search (Candelieri et al., 2019).
In kernel model selection, the scalable sandwich bounds of SKC allow exploration of rich kernel grammars at least an order of magnitude faster than naive methods, and capture long-range structure missed by alternatives relying solely on lower bounds or subsets (Kim et al., 2017).

A plausible implication is that incorporating explicit pattern-exploitation, whether via pattern recognition on variables or compositional search over structure, substantially improves the tractability and interpretability of kernel-based methods in both discrete optimization and statistical learning domains.