Variable Length Markov Chains (VLMC)

Updated 18 February 2026

Variable Length Markov Chains (VLMC) are stochastic processes where the future state depends on variable-length contexts rather than fixed-order histories.
The methodology uses context trees to assign probability distributions to minimal sufficient histories, enabling adaptive memory modeling with parsimony.
VLMCs have practical applications in data compression, natural language processing, and biological sequence analysis by capturing complex dependency structures.

A Variable Length Markov Chain (VLMC) is a stochastic process where the dependence of the next state on the past is not fixed at a predetermined order, as in classical Markov chains, but varies depending on the observed history. The set of “contexts” (the relevant suffixes of the past for next-state prediction) is organized into a rooted tree, termed the context tree, whose leaves encode all minimal sufficient histories. This structure allows VLMCs to capture long- and variable-range dependencies while maintaining parsimony, rendering them a flexible framework for stationary discrete-time processes over finite alphabets and for applications in data compression, natural language processing, and biological sequence analysis (Zambom et al., 2019, Enter et al., 2020, Collet et al., 2012, Cénac et al., 2010).

1. Mathematical Structure of VLMCs and Context Trees

Let $A$ be a finite alphabet, and let $\{X_n\}_{n\in \mathbb{Z}}$ be a stationary process taking values in $A$ . The central object in VLMC theory is the context tree $T$ with context set $\mathcal{C}$ (the set of leaves, which may include finite words and possibly infinite branches for unbounded memory) (Cénac et al., 2010, Enter et al., 2020). Each node $c\in \mathcal{C}$ is assigned a probability distribution $q_c$ over $A$ .

A context is a finite string $u = x_{n-\ell+1}^{n-1}$ such that, conditional on $u$ , the distribution of $X_n$ does not depend on further history: $P(X_n = a \mid X_{< n}) = P(X_n = a \mid X_{n-\ell}^{n-1} = u)$ for all $a\in A$ and all histories $X_{< n}$ ending with $u$ . Minimality is enforced by requiring that no proper suffix of $u$ satisfies this property (Zambom et al., 2019, Enter et al., 2020, Collet et al., 2012).

The dynamics of the VLMC are specified by: $P(X_{n+1} = \alpha \mid \text{history up to } n) = q_{c^*}(\alpha)$ where $c^*$ is the unique context (in $\mathcal{C}$ ) that is a suffix of the current past.

In general, the process of contexts $(C_n)$ , where $C_n$ is the context attached to the sequence up to time $n$ , forms a Markov chain on the context tree if and only if the tree is shift-stable (Cénac et al., 2020, Cénac et al., 2018).

2. Existence, Uniqueness, and Characterization of Stationary Measures

The stationary law of a VLMC—if it exists—solves an intricate balance of mass over the context tree, described via the “cascade” formalism and longest internal suffix (lis) combinatorics (Cénac et al., 2020, Cénac et al., 2018). For non-null, shift-stable VLMCs, the existence and uniqueness of a stationary probability measure is characterized as follows.

For each finite context, decompose $w$ into a sequence of symbols ending with its $\alpha$ –lis. The “cascade” $\casc(w)$ is the probability (under the stationary law) of generating the word $w$ from its context. The “cascade series” $\kappa_{\alpha s}$ is the sum of the cascades over all finite contexts with $\alpha$ –lis $\alpha s$ . The context tree's $\alpha$ –lis matrix $Q$ indexes transitions between these classes.

Existence and uniqueness holds if and only if:

All cascade series $\kappa_{\alpha s}$ converge (i.e., finite total mass in each $\alpha$ –lis class).
The associated matrix $Q$ is recurrent (irreducible, row stochastic, or positive recurrent depending on the particular tree).
There is a unique normalized left fixed vector $v$ of $Q$ such that $\sum v_{\alpha s}\kappa_{\alpha s} = 1$ (Cénac et al., 2020, Cénac et al., 2018, Cénac et al., 2010).

In particular, for stable context trees with finitely many $\alpha$ –lis, existence of a stationary measure reduces to convergence of the finitely many cascade series, and the law is unique (Cénac et al., 2020).

For general (possibly non-stable) VLMCs, weak Bernoulli (absolute regularity) and uniqueness are strictly linked to the finiteness of typical context lengths and can be established under weaker, context-based regularity conditions without requiring global continuity or uniform positivity (Ferreira et al., 2019).

3. Model Extensions: Exogenous Covariates and Multidimensional Structure

Recent advances incorporate exogenous covariates into the VLMC framework, yielding context-dependent generalized linear models (VLMCX or β-context models) (Zambom et al., 2019, Rocha et al., 2024). For $Y_n\in A$ and exogenous covariate vector $Z_n$ , the transition probabilities are modeled, for a context $u$ , via multinomial or logistic regression: $P(Y_n = a | \text{context } u, Z_{n-h+1}^{n-1}) = \frac{\exp\{\beta_{u,a}^T Z_{n-h+1}^{n-1}\}}{\sum_{b\in A} \exp\{\beta_{u,b}^T Z_{n-h+1}^{n-1}\}}$ Parameters $(\alpha^u, \{\beta_t^u\})$ can thus encode intricate interactions between the relevant past and recent exogenous observations. The estimation is performed via a context-pruning and coefficient-pruning algorithm driven by deviance-based likelihood ratio tests, with tuning parameters governing significance thresholds and tree depth (Zambom et al., 2019, Rocha et al., 2024). The result is a parsimonious, consistent estimator that adapts both the memory structure and exogenous influence as sample size increases.

4. Learning, Estimation, and Model Selection

Canonical estimation of the context tree proceeds by building a maximal candidate tree up to a chosen depth and then pruning using data-driven statistical criteria, such as penalized likelihoods and likelihood ratio tests. The original “Context” algorithm [Rissanen, 1983] is foundational, but contemporary approaches employ cross-validation, bootstrap-BIC, or Bayesian model selection in more complex or ill-posed settings, including in the presence of noise or exogenous influence (Duarte et al., 2017, Rocha et al., 2024, Cénac et al., 2018). Estimation procedures are consistent under mild regularity (non-nullness, sufficient sample per context) and can integrate prior knowledge or constraints, for example, through Bayesian structural priors or Dirichlet-multinomial frameworks (Freguglia et al., 2021).

5. Structural and Dynamical Properties

VLMCs generalize finite-order Markov chains and g-measures, corresponding to processes where the transition kernel is piecewise constant with respect to a (possibly countable) context tree partition (Enter et al., 2020, Collet et al., 2012, Cénac et al., 2010). The equivalence between stationary VLMCs and certain piecewise expanding interval maps allows for transplantation of results from dynamical systems theory—including spectral properties, ergodicity, entropy, and limit theorems—into the analysis of VLMCs (Collet et al., 2012, Cénac et al., 2010). Classical chains of infinite order are a proper subclass, but not all g-measures are VLMCs: only those with a countable context-tree representation.

A particularly important subclass is the “stable” VLMC, in which the context process itself is Markovian. In this scenario, renewal structures and semi-Markov embeddings can be exploited, and the initial letter process of the VLMC is itself semi-Markov (Cénac et al., 2019, Cénac et al., 2012, Cénac et al., 2020). This structure underlies connections to persistent random walks and enables precise renewal-theoretic analysis.

6. Limit Theorems and Asymptotic Results

For stable and certain unbounded-memory VLMCs, renewal-theoretic arguments facilitate derivation of precise limit theorems. Under mild moment conditions—such as Cramér's condition on regeneration cycles—strong laws of large numbers, central limit theorems, local limit theorems, and moderate deviations are available for statistics associated with the chain (Logachev et al., 2019). The renewal property, derived from the chain returning to specific contexts (“root” states), is central: it enables reduction to well-understood compound renewal processes or Markov renewal chains, with explicit identification of mean, variance, and rate functions.

7. Applications, Extensions, and Open Questions

VLMCs are deployed in universal data compression (context-tree weighting), biostatistics (biological sequence analysis), linguistics (prosody and rhythm analysis), epidemiology (dynamic models for disease outbreaks with spatial or exogenous covariates), and persistent random walks (Duarte et al., 2017, Rocha et al., 2024, Cénac et al., 2010). Key issues in practical settings include computational scaling for large alphabets or deep trees, handling of rare contexts, and principled choice of hyperparameters (e.g., tree depth, smoothing). The rigorous analysis of non-regular cases—such as g-functions with essential discontinuities—remains an active area, with recent advances enabling VLMCs to capture phenomena beyond the reach of chains with regular infinite-order kernels (Ferreira et al., 2019).

The VLMC framework continues to be extended toward random fields with variable neighborhoods, multidimensional settings, renewal structure detection, and data fusion across multiple sources (Enter et al., 2020, Rocha et al., 2024, Freguglia et al., 2021). Connections to dynamical systems and ergodic theory offer a promising avenue for deeper understanding of mixing, intermittency, and complex dependence structures in non-Markovian stochastic models.