Robust Locally-Linear Controllable Embedding
- The paper introduces a framework that integrates probabilistic latent embeddings with locally-linear dynamics to enable robust and efficient control from high-dimensional data.
- RCE employs a novel inference scheme that conditions on both current and next observations, thereby reducing variational errors and enhancing noise robustness.
- The predictive coding extension removes the need for high-dimensional reconstruction, achieving faster training and superior performance in benchmark tasks.
Robust Locally-Linear Controllable Embedding (RCE) is a generative modeling framework for optimal control from high-dimensional observations, such as images, that enforces controllable and locally-linear latent dynamics. By introducing probabilistic latent embeddings and model structures facilitating robust local linearization, RCE supports direct closed-loop control using classical linear methods, while maintaining robustness to unmodeled process noise and addressing the variational inference challenges that arise in high-dimensional sequential data. The RCE approach was first formalized in "Robust Locally-Linear Controllable Embedding" (Banijamali et al., 2017) and substantially advanced in "Predictive Coding for Locally-Linear Control" (Shu et al., 2020), the latter removing the requirement for high-dimensional reconstruction via decoder networks through an information-theoretic predictive coding bottleneck.
1. Latent Representation and Generative Modeling
RCE models employ an observation-conditional latent embedding , where is a high-dimensional observation and is a tractable, low-dimensional representation. The key innovation is enforcing locally-linear transition dynamics in the latent space: for each , the next-step latent variable is realized by
where , , are determined as functions of a stochastic linearization point , itself sampled from and . The observation is subsequently generated from via a decoder (Banijamali et al., 2017).
In the predictive coding extension (PC3), the decoder can be dispensed with entirely; instead, the model is regularized to maximize the mutual information between next-step and current-step latent codes given the control action, thus enforcing that the latent transition is maximally predictive for control purposes (Shu et al., 2020).
2. Local Linearity and Robust Control
Local linearity is enforced not globally, but in a distributional sense, through randomization of linearization points in the latent space. At each step, , , and are parameterized as functions evaluated at sampled , leading to a locally-varying but affine mapping .
Robustness is achieved by treating the linearization points as random variables, ensuring the controller sees an average model over plausible localizations in latent space. In (Banijamali et al., 2017), invertibility of is ensured by parameterizing its inverse in a rank-one "identity-plus-perturbation" form, further regularizing the model's local geometry.
The predictive coding variant adds a low-curvature penalty: for random perturbations drawn from an isotropic Gaussian, the deviation of the dynamics from its linear approximation around is penalized, ensuring the latent dynamics remain close to linear for use with LQR/iLQR methods (Shu et al., 2020).
3. Variational Inference and Posterior Approximation
Crucially, the RCE posterior factorizes in a manner enabling tractable amortized inference, conditioning not only on but, importantly, on . For each transition,
where and are amortized neural encoders, and is the inverse of as parameterized above (Banijamali et al., 2017). Conditioning inference on ("backward encoding") significantly reduces variational approximation error, especially under process noise, compared to prior approaches that only condition on .
The predictive coding model (Shu et al., 2020) dispenses with high-dimensional reconstruction and trains using a contrastive predictive coding (CPC) mutual information lower bound. For transitions,
where is the noise-perturbed latent encoding of .
4. Optimization Objectives
The original RCE (Banijamali et al., 2017) maximizes an evidence lower bound (ELBO) on the predictive log-likelihood of the next observation:
This objective jointly regularizes reconstruction, KL divergence, posterior entropy, and transition matching, ensuring that the learned embedding supports locally linear, robust control.
In decoder-free PC3 (Shu et al., 2020), the composite loss is
with additional regularization and a centering penalty.
5. Algorithmic Workflow
Both RCE (Banijamali et al., 2017) and PC3 (Shu et al., 2020) utilize an end-to-end stochastic gradient pipeline integrating inference, transition modeling, and regularization:
- Sample mini-batches of observation-action-observation transitions.
- Encode via forward and backward encoders.
- For PC3, noise-perturb the encoded ; for RCE, reconstruct from .
- Compute relevant losses (CPC, ELBO, consistency, curvature).
- Backpropagate gradients and update all parameters using Adam.
Standard hyperparameters for PC3 include batch size 256, , , loss weights , learning rate .
6. Comparative Robustness and Benchmarks
RCE models eliminate key failure modes of earlier embed-to-control (E2C) approaches, including lack of a likelihood-based training objective and variational errors under noise. Conditioning the posterior on enables robust encoding even under stochastic dynamics.
PC3 achieves further efficiency and robustness by removing the high-dimensional decoder, thus avoiding overfitting and drastically reducing model parameterization. The mutual information bottleneck ensures retention only of those factors necessary for predictive control, while explicit consistency and curvature regularization prevent representational collapse and enable the use of iLQR and LQR planners.
In empirical evaluations across planar navigation, inverted pendulum swing-up, cartpole, and three-link arm visual domains, RCE outperforms E2C with lower reconstruction and planning costs, and higher goal-reaching rates, especially as process noise is increased. For example, at high process noise (σₙ=5) in planar navigation, RCE achieves approximately 27% lower reconstruction loss and almost twice the success rate as E2C. PC3 further outperforms both PCC and SOLAR, achieving 58.4% time-in-goal on the swing-up pendulum task (versus 26.4% for PCC and 35.4% for SOLAR), and 96.3% on cartpole (compared to 94.4% for PCC and 91.2% for SOLAR). Model training time is significantly reduced: PC3 is approximately 1.9× faster than PCC and 53× faster than SOLAR (Shu et al., 2020, Banijamali et al., 2017).
7. Significance, Extensions, and Related Work
RCE establishes a generative-modeling pathway for learning control-ready, robustly linear subsets of the observational space, supporting the tractable application of LQR/iLQR. The use of posterior regularization, local random linearization, and predictive coding mutual information objectives represents a consistent progression toward models firmly grounded in the data likelihood, improving sample efficiency, noise robustness, and computational tractability.
A plausible implication is that further advances may generalize the RCE approach, combining efficient decoder-free predictive objectives with advanced variational structures, or extend to broader classes of non-linear control systems with tractable local approximations. The contrastive predictive coding facilitation of controllability in latent space suggests links with recent developments in self-supervised representation learning for control under uncertainty.
Key references: "Robust Locally-Linear Controllable Embedding" (Banijamali et al., 2017); "Predictive Coding for Locally-Linear Control" (Shu et al., 2020).