Observable Operator Models (OOMs)

Updated 9 April 2026

Observable Operator Models (OOMs) are a linear algebraic framework for representing stochastic processes using operator-valued states and transition matrices.
OOMs generalize classical models like HMMs by accommodating real or complex state spaces without strict non-negativity constraints while ensuring valid probability assignments.
Efficient spectral and optimization-based estimation methods enable practical learning of OOM parameters even in scenarios with missing data and non-equilibrium observations.

Observable operator models (OOMs) constitute a linear algebraic, operator-valued representation of stochastic processes and dynamical systems. They generalize hidden Markov models (HMMs) by permitting more expressive state processes—including real or complex vectors and operators that are not required to be element-wise non-negative—while preserving the essential requirement that every observation sequence is assigned a valid (i.e., non-negative and normalized) probability. OOMs subsume HMMs, nth-order Markov models, and other classical dynamical system models, providing a unifying analytic and algorithmic paradigm for both discrete and continuous-valued processes, as well as for certain quantum generalizations.

1. Mathematical Formulation and Core Properties

A $d$ -dimensional OOM over finite observation alphabet $\Sigma$ is typically specified by the tuple $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ , where:

$\sigma \in (\mathbb{R}^d)^\top$ (real case) is the evaluation row vector,
$\omega_\epsilon \in \mathbb{R}^d$ is the initial state vector,
$\tau_x \in \mathbb{R}^{d \times d}$ for each $x \in \Sigma$ is the observable operator for symbol $x$ .

The normalization constraints are:

$\sigma\,\omega_\epsilon = 1$ ,
$\sum_{x \in \Sigma} \tau_x$ has 1 as a simple eigenvalue.

For any finite sequence $\Sigma$ 0, the assigned probability is

$\Sigma$ 1

with $\Sigma$ 2 and $\Sigma$ 3 for all $\Sigma$ 4.

OOMs can be equivalently formulated over complex spaces by relaxing all real constraints:

$\Sigma$ 5 and $\Sigma$ 6, with $\Sigma$ 7,
parameter set $\Sigma$ 8 of complex $\Sigma$ 9 operators,
for all prefixes, marginalization: $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 0,
non-negative probabilities: $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 1 (Adhikary et al., 2019, Liu, 2018).

2. Expressiveness and Structural Generality

OOMs strictly generalize HMMs and nth-order Markov models:

Any HMM of $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 2 hidden states can be cast as an OOM of the same or lower dimension.
The system-dynamics matrix $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 3, with entries $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 4, has finite rank $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 5 iff the process can be represented exactly as an OOM (and as a Predictive State Representation (PSR)), by choosing $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 6 linearly independent “core tests” or union-tests (Singh et al., 2012).
OOMs precisely characterize all stochastic processes whose system-dynamics matrix has finite rank, unlike HMMs or POMDPs, which induce only particular low-rank structures.

In the controlled (input/output) setting, Interpretable IO-OOMs are a strict subclass of PSRs, as the constraint that all union-tests share the same fixed action block limits the class of representable systems. PSRs remove this restriction and therefore strictly generalize IO-OOMs (Singh et al., 2012).

3. The Negative Probability Problem and Quantum Extensions

A central challenge in OOM theory is the negative probability problem (NPP): since OOM state vectors and operators can have negative entries, it is nontrivial to guarantee that all sequence probabilities are non-negative. Indeed, it has been shown that verifying whether a given OOM ever assigns a negative probability is undecidable (Adhikary et al., 2019).

A non-constructive cone characterization provides necessary and sufficient conditions for validity: an OOM is valid if there exists a pointed convex cone $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 7 such that the initial state and all images under $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 8 remain in $(\sigma, \{\tau_x\}_{x \in \Sigma}, \omega_\epsilon)$ 9, and $\sigma \in (\mathbb{R}^d)^\top$ 0 for all $\sigma \in (\mathbb{R}^d)^\top$ 1 (Adhikary et al., 2019). However, this approach is not constructive, as $\sigma \in (\mathbb{R}^d)^\top$ 2 must be guessed.

Hidden Quantum Markov Models (HQMMs) form a key subclass of OOMs, where the negative probability issue is circumvented by design. Here, all operators are completely positive (CP) and trace-preserving (TP) maps acting on positive-semidefinite (PSD), unit-trace density matrices. In Liouville (vectorized) form, such HQMMs are OOMs for which the non-negativity of sequence probabilities is guaranteed (Adhikary et al., 2019). HMMs $\sigma \in (\mathbb{R}^d)^\top$ 3 NOOMs $\sigma \in (\mathbb{R}^d)^\top$ 4 HQMMs $\sigma \in (\mathbb{R}^d)^\top$ 5 OOMs, but it is still unresolved whether HQMMs achieve the full expressiveness of general OOMs.

4. Learning and Estimation Algorithms

OOM parameters can be efficiently estimated using spectral algorithms. The standard approach constructs empirical Hankel matrices by counting frequencies of strings in observed data, performs low-rank SVD for dimension reduction, and computes the OOM parameters using projections onto leading singular spaces (Liu, 2018, Wu et al., 2016). For missing-value scenarios, wildcards are introduced within the counting process, yielding consistent estimates when missingness is “Always Missing Sequentially At Random” (AMSAR) (Liu, 2018).

Non-equilibrium data—i.e., datasets not sampled from the stationary distribution—can still yield consistent OOM estimation via an “equilibrium constraint,” solved as a quadratic program to enforce stationarity. This allows exact recovery of equilibrium dynamics even from transient trajectories (Wu et al., 2016).

For continuous observation spaces, the binless spectral OOM algorithm sidesteps discretization entirely: operator-valued moments are constructed directly from observed data via rank-one updates, yielding consistent estimates of continuous equilibrium dynamics, and scaling with linear complexity in sample size (Wu et al., 2016).

HQMM parameter learning involves optimization on the Stiefel manifold to maintain the trace-preserving constraint on Kraus operators. Constrained Optimization on the Stiefel Manifold (COSM) employs retraction-based gradient steps that guarantee feasibility at every update. Empirically, this approach converges to higher log-likelihoods 10–100 $\sigma \in (\mathbb{R}^d)^\top$ 6 faster than earlier methods, and scales to models previously inaccessible to rotation-based algorithms (Adhikary et al., 2019).

5. Process Dimension and Complexity

Given a stationary stochastic process, the process dimension is defined as the minimal dimension $\sigma \in (\mathbb{R}^d)^\top$ 7 such that there exists an OOM of dimension $\sigma \in (\mathbb{R}^d)^\top$ 8 generating the process. Equivalently, this is the rank of the Hankel matrix, or the span of all reachable OOM state vectors after finite observation sequences (Löhr et al., 2011). In the non-commutative (quantum) setting, process dimension is the size of the associated finite matrix algebra for finitely correlated states (MPS representations).

Key properties:

Lower semi-continuity: under weak- $\sigma \in (\mathbb{R}^d)^\top$ 9 convergence, the process dimension never increases.
Ergodic decomposition: the process dimension of a convex mixture is the sum of process dimensions of the ergodic components.
Relation to causal states: the logarithm of the process dimension is bounded above by the topological statistical complexity (the log-cardinality of the causal state partition) (Löhr et al., 2011).

6. Infinite-Dimensional OOMs and Approximation Theory

Many real-world stochastic processes are infinite-dimensional, rendering the direct finite-rank OOM construction inapplicable. Measure-theoretic arguments allow the canonical future-distribution space $\omega_\epsilon \in \mathbb{R}^d$ 0 to be embedded in $\omega_\epsilon \in \mathbb{R}^d$ 1, where observable operators act as bounded linear maps.

However, no norm or inner product renders $\omega_\epsilon \in \mathbb{R}^d$ 2 complete (i.e., a Hilbert or Banach space) in infinite dimension, due to its countable basis (Anyszka, 2024). The remedy is to pass to the $\omega_\epsilon \in \mathbb{R}^d$ 3 closure $\omega_\epsilon \in \mathbb{R}^d$ 4, in which operators extend uniquely and continuously. Approximation theory in this setting leverages compactness or density of finite-rank operators in $\omega_\epsilon \in \mathbb{R}^d$ 5 to construct finite-dimensional OOM surrogates with provable error control, if suitable mixing or spectral decay conditions hold (Anyszka, 2024).

The practical implementation requires:

Identification of compactness or approximability for $\omega_\epsilon \in \mathbb{R}^d$ 6,
Error estimates for finite-rank truncations,
Algorithmic realization through finite prefix sets and empirical averages (e.g., the Efficient Sharpening ES-algorithm).

7. Applications, Empirical Performance, and Limitations

OOMs have demonstrated empirical advantages over HMMs and EM-based models:

In settings with non-ignorable or non-i.i.d. missingness, refined spectral OOMs outperform marginalization-based HMMs in prediction tasks, as shown in ring-HMMs and real-world time series with severe missingness patterns (Liu, 2018).
In nonequilibrium or transient settings, equilibrium OOMs adjust for bias inherent in short, transient trajectories, achieving more accurate long-term forecasts (Wu et al., 2016).
Binless OOMs for continuous data avoid both discretization bias and high computational complexity, enabling efficient and statistically consistent modeling of molecular dynamics and multi-dimensional physical processes (Wu et al., 2016).
COSM-trained HQMMs consistently outperform HMMs of comparable size and previous HQMM learners in tasks such as DNA sequence modeling, with dramatically better scalability (Adhikary et al., 2019).

Nevertheless, theoretical challenges remain:

The negative probability problem is undecidable for general OOMs; only subclasses with constructive parameter sets, such as HQMMs, guarantee validity.
Spectral estimation in the presence of non-AMSAR missingness or for non-time-ordered data requires further methodological innovation (Liu, 2018).
For infinite-dimensional systems, the identification of effective compactness criteria for observable operators, and the quantification of convergence rates for finite-rank approximations, are open technical problems (Anyszka, 2024).

In summary, OOMs provide a mathematically principled and algorithmically practical extension of HMMs and related models, with broad applicability across discrete, continuous, and quantum stochastic processes, and serve as a foundational tool in the modern theory and practice of predictive state modeling (Adhikary et al., 2019, Liu, 2018, Singh et al., 2012, Anyszka, 2024, Wu et al., 2016, Löhr et al., 2011).