Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Learning and Planning with Compressed Predictive States

Published 1 Dec 2013 in cs.LG and stat.ML | (1312.0286v2)

Abstract: Predictive state representations (PSRs) offer an expressive framework for modelling partially observable systems. By compactly representing systems as functions of observable quantities, the PSR learning approach avoids using local-minima prone expectation-maximization and instead employs a globally optimal moment-based algorithm. Moreover, since PSRs do not require a predetermined latent state structure as an input, they offer an attractive framework for model-based reinforcement learning when agents must plan without a priori access to a system model. Unfortunately, the expressiveness of PSRs comes with significant computational cost, and this cost is a major factor inhibiting the use of PSRs in applications. In order to alleviate this shortcoming, we introduce the notion of compressed PSRs (CPSRs). The CPSR learning approach combines recent advancements in dimensionality reduction, incremental matrix decomposition, and compressed sensing. We show how this approach provides a principled avenue for learning accurate approximations of PSRs, drastically reducing the computational costs associated with learning while also providing effective regularization. Going further, we propose a planning framework which exploits these learned models. And we show that this approach facilitates model-learning and planning in large complex partially observable domains, a task that is infeasible without the principled use of compression.

Citations (41)

Summary

  • The paper introduces CPSRs, a method leveraging random projections to compress state representations and reduce computational costs in model-based reinforcement learning for POMDPs.
  • It demonstrates that CPSRs achieve comparable or superior prediction accuracy with orders of magnitude faster model build times than traditional TPSRs.
  • It integrates CPSRs with fitted-Q function approximation to enable effective planning in challenging domains with large observation spaces and complex dynamics.

Efficient Learning and Planning with Compressed Predictive States: An Expert Review

Introduction

"Efficient Learning and Planning with Compressed Predictive States" (1312.0286) introduces Compressed Predictive State Representations (CPSRs), an extension of PSRs leveraging compressed sensing and random projection techniques to enable tractable and regularized learning of models for partially observable systems. The work addresses a key limitation of prior PSR approaches: the super-linear computational and storage costs of model estimation in complex, partially observable environments. The authors further integrate CPSR model learning with reinforcement learning strategies by combining their approach with fitted-QQ function approximation for planning, and validate their framework empirically on synthetic and real-world domains.

Background: Predictive State Representations and Learning Paradigms

PSRs represent the state of partially observable dynamical systems as predictions about the future, specifically as the conditional probability over a core set of tests given history. As opposed to latent-state models such as POMDPs or HMMs, PSRs operate only with observable quantities, bypassing combinatorial latent-state inference and EM optimization. Learning PSRs can be performed either by combinatorial (discovery-based) identification of a minimal core set or by subspace/spectral techniques (TPSR), which estimate a subspace spanning the core tests. For large domains, subspace-based methods offer lower sample complexity but still require the computation of large observable matrices whose size scales with the number of possible histories and tests.

CPSRs: Compressed Sensing for Model Learning

Computational Bottleneck and Motivation

The principal computational bottleneck in subspace-based PSR learning arises from the necessity to compute, store, and invert large matrices whose dimensions are proportional to the number of unique tests and histories observed in training. This is infeasible for systems with large observation spaces, long trajectories, or complex dynamics.

Compression Methodology

The CPSR framework introduces random projections—matrices satisfying the Johnson-Lindenstrauss lemma—to project observables into lower-dimensional compressed feature spaces. Both the test and history vectors are independently mapped onto smaller subspaces, and model estimation proceeds wholly within this compressed domain.

  • Batch Learning: Matrix computations are performed directly in compressed space. The algorithm constructs compressed probability matrices by iterating over sampled trajectories and applying the random projections.
  • Incremental Learning: CPSR models can be updated incrementally by updating SVDs and compressed counts with new data, supporting online or mini-batch regimes.

A notable advantage is that neither the set of all tests nor the set of all histories must be enumerated or stored, and the random projections are instantiated on-the-fly for encountered elements.

Theoretical Analysis

A rigorous theoretical section establishes that CPSRs are consistent estimators when the projection dimension exceeds the rank of the underlying system, matching prior PSR learning guarantees. For lower-dimensional compressions, the analysis connects the induced bias and variance of the learned model to compressed regression risk bounds, leveraging observed sparsity in real domains and covariance decay of predictive features. The error introduced by random projections is formally quantified, demonstrating controlled bias and reduced variance in the compressed regime.

Planning with CPSRs

CPSRs are integrated into a planning framework via the fitted-QQ paradigm. Instead of observable states, CPSR predictive vectors are used as the input to non-linear function approximators (e.g., Extra-Trees) to estimate the action-value function. This enables end-to-end model-based RL in partially observable domains using only observable trajectories, without hand-specified state features.

Empirical Evaluation

Model Quality

Experiments on the ColoredGridWorld domain demonstrate that CPSRs can achieve or even exceed the prediction accuracy of uncompressed TPSRs at long-term prediction horizons, which is attributed to regularization effects of random projection. Strong runtime reductions are shown: model build times for CPSRs are orders of magnitude smaller than those for the uncompressed baseline. Figure 1

Figure 1: The log-likelihood of test data as a function of prediction horizon for various models; compressed approaches maintain or improve accuracy.

Figure 2

Figure 2: Model construction times (log-scale) indicate drastic efficiency gains for compressed models.

Figure 3 compares the effect of compressing both histories and tests versus only compressing tests, with results showing negligible impact on model likelihood. Figure 3

Figure 3: History compression introduces minimal degradation to model likelihood.

Planning Performance

Planning experiments are performed on ColoredGridWorld, standard and sparsified (S-PocMan) PacMan, and an adaptive migratory management (AMM) domain:

  • On ColoredGridWorld, CPSR-based agents consistently outperform both memoryless (model-free) and random baselines, as would be expected for a strongly partially observable task. Figure 4

    Figure 4: Average returns in ColoredGridWorld; memory-based model learning is essential for task proficiency.

  • In PocMan, where immediate observations nearly suffice to maximize reward, memoryless controllers approach the performance of the best CPSR agents, highlighting that model-based approaches are not strictly necessary in weakly partially observable environments. Figure 5

Figure 5

Figure 5: Visualization of the S-PocMan domain structure.

  • In S-PocMan, increased partial observability (by removing directional cues and sparsifying reward) restores the supremacy of model-based (CPSR) control.
  • In the AMM domain, only CPSR-based agents achieve a measurable performance gain over random, as the history dependence, large state space, and non-stationarity preclude effective model-free policies. Figure 6

    Figure 6: Discounted return per episode in AMM; performance is monotonic with projection dimension for the best models.

    Figure 7

    Figure 7: Total undiscounted return per episode in AMM; model-based approaches again dominate.

Across domains, the choice of compression dimension and projection type is shown to affect performance. Planning is more sensitive to these meta-parameters than one-step prediction, presumably due to error propagation in rollouts.

The study elaborates practicalities in implementation: the importance of hyperparameter selection, LRU-caching for projection lookups, and maintaining numerical stability via normalization and SVD truncation. The method is compared and contrasted with feature-based or kernel PSRs, memory-augmented PSRs, U-Tree, and AIXI-style adaptive history models. Critically, while feature- or kernel-based PSR approaches focus on continuous/structured observation spaces, CPSR emphasizes computational scalability and regularization via compression, making principled trade-offs between model bias and variance.

Implications and Future Directions

The work demonstrates that large, partially observable RL problems can be tractably addressed by compressing the predictive state space via random projections without explicit feature engineering. It is empirically shown that in highly aliased domains, memory-based approaches are essential, while in 'nearly observable' domains, simple model-free RL can suffice.

For future research, salient directions include:

  • The use of non-least-squares objectives in the implicit regression step, e.g., regularized or robust loss functions, or direct minimization of planning loss.
  • Compositional and modular architectures (e.g., hierarchical CPSRs or hybrid approaches integrating with memory-networks).
  • Distributed or hardware-optimized implementations for large action/observation spaces, and adaption to continuous observation domains, potentially by combining with domain-specific feature learning or kernel approximations.
  • Deeper theoretical investigation of the interaction between compression, planning horizon, and statistical efficiency, especially for non-i.i.d. sampling regimes.

Conclusion

This work establishes compressed predictive state representation as a tractable and theoretically sound mechanism for model-based reinforcement learning in complex POMDPs with large observation and action spaces. CPSRs provide a flexible alternative to classic PSR learning and kernel-based approaches, support both batch and online learning, and deliver effective policies in environments where even domain-specific models are unavailable or infeasible to specify. The empirical results and comprehensive theoretical treatment validate compressed sensing as a key enabling technology for scalable sequential decision-making in partially observable domains.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.