Papers
Topics
Authors
Recent
2000 character limit reached

Predictive State Representations (PSRs)

Updated 5 February 2026
  • Predictive State Representations (PSRs) are models that define a system’s state as a vector of conditional probabilities of future observable tests, eliminating latent variables.
  • PSRs generalize traditional models by enabling applications in filtering, planning, and control through techniques like kernelized methods and linear-Gaussian formulations.
  • Learning PSRs relies on spectral estimation techniques, such as Hankel matrix factorization and two-stage regression, to update observable operator matrices consistently.

Predictive State Representations (PSRs) are a class of models for dynamical systems that eschew latent variables, as in Hidden Markov Models (HMMs) or Partially Observable Markov Decision Processes (POMDPs), and instead represent the system’s state as a function of observable quantities: predictions about future events (“tests”) conditioned on observable histories. This predictive state can be updated recursively and used for filtering, prediction, planning, and control in both discrete and continuous settings. PSRs provide a statistically consistent alternative to latent-state models, with far-reaching implications for machine learning, control theory, and reinforcement learning.

1. Fundamental Definition and Theoretical Foundations

A Predictive State Representation (PSR) models a dynamical system by defining its state at time tt as a vector of conditional probabilities of outcomes of a pre-selected set of future "tests" (length-kk sequences of future action-observation pairs), given the past history. Formally, the system-dynamics matrix D\mathcal D is an infinite matrix where entry (i,j)(i,j) is the probability P(tjhi)P(t_j | h_i) that test tjt_j succeeds (i.e., a specified sequence of observations occurs given forced actions) after history hih_i (Singh et al., 2012). If D\mathcal D has finite rank kk, there exists a set Q={q1,...,qk}Q = \{q_1, ..., q_k\} of kk linearly independent core tests such that for any history hh, the vector

s(h)=[P(q1h), ...,P(qkh)]Ts(h) = \left[ P(q_1 | h),\ ..., P(q_k | h) \right]^T

is a linearly sufficient statistic of the observable past. Any other test prediction P(th)P(t|h) can be computed as s(h)Tmts(h)^T m_t for some weight vector mtm_t (Singh et al., 2012). PSRs thus define a “minimal” observable state sufficient for future prediction and are strictly more general than nnth-order Markov models or finite-state HMMs/POMDPs: PSRs of rank kk can represent any system with a system-dynamics matrix of rank kk, a set that strictly includes but is not limited to hidden-state models (Singh et al., 2012).

The PSR state-update is driven entirely by observable operators. Given action ata_t and observation oto_t, the PSR state is updated as:

st+1=Matotst1TMatotsts_{t+1} = \frac{M_{a_t o_t} s_t}{1^T M_{a_t o_t} s_t}

where MaoM_{a o} is a k×kk \times k observable-operator matrix whose jjth column corresponds to the test formed by appending (a,o)(a,o) to core test qjq_j (Singh et al., 2012). The normalization ensures the new state remains a valid conditional probability vector. This update allows simulation, filtering, and prediction purely from observable quantities.

2. Core Model Classes and Extensions

PSRs naturally generalize to various settings, including controlled/interactive systems and systems with continuous observations:

  • Hilbert Space Embeddings: By mapping histories and tests into Reproducing Kernel Hilbert Spaces (RKHSs), one obtains kernelized, nonparametric PSRs ("HSE-PSRs") capable of handling infinite or continuous action/observation spaces. The predictive state becomes a conditional embedding operator CYX=CYXCXX1C_{Y|X}=C_{YX}C_{XX}^{-1} in an RKHS (Boots et al., 2013). All learning, updates, and predictions then reduce to operations on Gram matrices and kernel products, enabling statistically consistent nonparametric learning.
  • Linear-Gaussian PSRs (PLGs): In purely continuous settings, the Predictive Linear-Gaussian (PLG) model defines the state as the distribution (mean and covariance) of the next nn observations, conditioned on the past, with update equations matching those of the Kalman filter but parameterized only by observable moments (Rudary et al., 2012). PLGs subsume classical linear dynamical systems while requiring fewer parameters and provide a consistent estimation procedure in the method-of-moments style.
  • Nonparametric and Compressed Representations: Feature-based PSRs using random Fourier features (RFF-PSRs), principal components, or compressed sensing (CPSRs) allow expressive modeling and scalable learning in high-dimensional domains. Two-stage regression (2SR) and other spectral algorithms provide globally consistent initializations in both discrete and continuous settings (Hefny et al., 2017, Hamilton et al., 2013).

3. Learning Algorithms and Spectral Estimation

The central innovation for learning PSRs is the use of spectral/moment-based estimation, in contrast to the expectation-maximization procedures of latent-state models:

  • Spectral Learning via Hankel Matrix Factorization: Given observable data, one selects finite sets of histories and tests, estimates the empirical Hankel matrix Hi,j=P(testi,historyj)H_{i,j} = P(\text{test}_i, \text{history}_j), and computes its rank-kk SVD: H=UΣVH = U \Sigma V^\top (Liu et al., 2016). PSR parameters are then extracted as:

b=UPH;b=(PT,H)PH;Bao=UPT,ao,H(UPT,H)b_* = U^\top P_H; \quad b_\infty^\top = (P_{T,H}^\top)^\dagger P_H; \quad B_{ao} = U^\top P_{T,ao,H} (U^\top P_{T,H})^\dagger

The recursive update uses BaoB_{ao} as observable operators, with the state normalized for each filtering step (Liu et al., 2016).

  • Consistent Nonconvex-Free Initialization: For high-dimensional or kernelized representations, a consistent two-stage regression (2SR) or instrumental regression is used: first, regress future features on past/present features, then fit a system dynamics operator minimizing the difference between actual and predicted (extended) features (Hefny et al., 2017).
  • Model Entropy and Basis Selection: In practice, only finite subsets of test and history indices can be used for spectral learning. Entropy-based selection quantifies the informativeness of a test basis by clustering prediction-vectors and building a Markov process on the resulting states; lower entropy corresponds to more deterministic, sufficient representations and more accurate learned models (Liu et al., 2016).
  • Incremental/Online Learning: Efficient updates to the empirical Hankel matrix and its SVD enable online model adaptation. Rank-one update schemes (e.g., Brand’s algorithm) allow for sample-by-sample or batch incremental learning with consistent parameter estimation (Liu et al., 2019).

4. PSRs in Planning, Control, and Reinforcement Learning

PSRs enable planning and control by making the predictive state an explicit, sufficient statistic for future reward and observation computation. This eliminates the need for inference over latent belief states:

  • Model-Based Planning: A learned PSR can be used as a generative model within sample-based planners such as Monte Carlo Tree Search (MCTS), in which the PSR state is updated recursively during rollouts (Liu et al., 2019). Compact and efficient state updates support planning in large, partially observable domains.
  • Model-Free/Reactive Policies: By integrating a PSR-style filter as part of a recurrent architecture (e.g., PSRNNs or RPSP Networks), one can couple predictive state with a direct policy head, yielding architectures that are differentiable end-to-end and can be optimized via policy gradient, inference loss, or combinations thereof (Hefny et al., 2018, Downey et al., 2017).
  • PAC and Regret Guarantees: Recent results deliver polynomial sample-complexity algorithms for PSRs with refined structural conditions, such as B-stability, and introduce tractable exploration strategies (e.g., UCB-style optimism and posterior-sampling) specifically adapted to the predictive state setting (Chen et al., 2022, Huang et al., 2023, Zhan et al., 2022). Explicit bounds in terms of PSR rank, horizon, and action-space dimension advance the theory of sample-efficient reinforcement learning in partially observable settings.
  • Multi-Task and Transfer Learning: PSRs admit a unified theory of sample-complexity in multi-task RL based on the η\eta-bracketing number of the joint model class. When the complexity of the union of tasks is significantly lower than the sum of single-task complexities (i.e., tasks share structure), multi-task PSR learning yields quantifiable sample-efficiency gains (Huang et al., 2023).

5. Practical Architectures and Deep Reinforcement Learning

Extensions of PSRs inform novel neural network architectures and self-supervised learning strategies:

  • Predictive State Recurrent Neural Networks (PSRNNs): PSRNNs embed the Bayes-filter update of the PSR in a network architecture using bilinear or multiplicative transfer functions. The PSR update step (mapping current state and new observation to next state) becomes a tensor contraction, which can be efficiently factorized via CP decomposition to lower parameter counts (Downey et al., 2017). PSRNNs can be initialized via two-stage regression and refined by backpropagation through time (BPTT), offering statistically consistent starting points and rich gating mechanisms for learning dynamical systems.
  • Predictive-State Decoders for RNNs: Supervised penalties that force an RNN’s hidden state to be linearly predictive of future-observation statistics (the PSR sufficient statistic) act as powerful regularizers and accelerate learning in filtering, imitation learning, and RL tasks (Venkatraman et al., 2017). Empirically, such augmentations yield reduced data requirements and improved final performance relative to standard RNNs, GRUs, or LSTMs.
  • Variational and Latent-Space Generalizations: Modern predictive representation learning architectures, such as VJEPA and its Bayesian extension BJEPA, train predictive latent embeddings using variational objectives, ensuring PSR-style predictive sufficiency and enabling recursive Bayesian filtering in representation space. These methods unify PSRs with probabilistic world models and amortized filtering, producing sufficient information states for control and principled uncertainty estimation in high-dimensional regimes (Huang, 20 Jan 2026).

6. Expressiveness, Limitations, and Rewards

PSRs are strictly more expressive than nnth-order Markov models or finite HMMs/POMDPs, capturing all systems of finite linear dimension, and are strictly more general in the controlled setting than observable operator models (OOMs) (Singh et al., 2012). In continuous domains, PLGs and kernelized PSRs are strictly more general than AR, ARMA, or Kalman filter models of the same dimension (Rudary et al., 2012, Boots et al., 2013). However:

  • Rewards: Standard PSRs can only model those reward functions that are linearly reconstructible from the column space of the test-outcome matrix. If this fails, as is often the case, no function of the PSR state can, in general, capture the true POMDP reward. Reward-PSRs (R-PSRs) augment the test set with "intents" combining tests and reward-token actions, fully restoring representational fidelity for both observations and rewards and enabling value iteration that exactly matches POMDP-optimal policies and values (Baisero et al., 2021).
  • Identifiability and Partitioning: Spectral algorithms for PSRs, when invoked on partially observable models with insufficient rank or indistinguishable observation distributions across states, can recover parameters only up to a partition of states; transitions and emissions are learned modulo these aggregations. Only when full-rank transition and observation conditions are satisfied can the underlying POMDP parameters be retrieved up to similarity (Shaw et al., 26 Jan 2026).

7. Computational and Algorithmic Aspects; Basis Selection

Scaling PSRs to high-dimensional or long-memory domains is addressed via:

  • Compressed PSRs (CPSRs): Random projections, incremental SVD, and basis compression reduce computational and storage cost without sacrificing statistical consistency. CPSRs offer a practical balance of bias, variance, and tractability and enable model-based planning in domains that are otherwise infeasible (Hamilton et al., 2013).
  • Basis and Test Selection: The informativeness and sufficiency of the test-set basis are critical. Entropy-based selection procedures quantitatively identify minimal bases for predictive accuracy and regularization (Liu et al., 2016). Optimization of basis selection remains an important empirical topic, as the choice of tests and histories can greatly influence learning outcomes and computational scaling.

Predictive State Representations occupy a central position in the modeling of dynamical systems, unifying perspectives from control theory, statistical learning, and reinforcement learning via their focus on observable, predictive statistics. State-of-the-art research continues to refine their theoretical guarantees, computational efficiency, and applicability across both classical and deep learning contexts. Their capacity for consistent, fully observable, and minimal modeling of both discrete and continuous, partially observable processes makes them foundational for principled data-driven modeling and planning (Singh et al., 2012, Boots et al., 2013, Downey et al., 2017, Chen et al., 2022, Huang et al., 2023, Baisero et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Predictive State Representations (PSRs).