Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robust Locally-Linear Controllable Embedding

Updated 14 March 2026
  • The paper introduces a framework that integrates probabilistic latent embeddings with locally-linear dynamics to enable robust and efficient control from high-dimensional data.
  • RCE employs a novel inference scheme that conditions on both current and next observations, thereby reducing variational errors and enhancing noise robustness.
  • The predictive coding extension removes the need for high-dimensional reconstruction, achieving faster training and superior performance in benchmark tasks.

Robust Locally-Linear Controllable Embedding (RCE) is a generative modeling framework for optimal control from high-dimensional observations, such as images, that enforces controllable and locally-linear latent dynamics. By introducing probabilistic latent embeddings and model structures facilitating robust local linearization, RCE supports direct closed-loop control using classical linear methods, while maintaining robustness to unmodeled process noise and addressing the variational inference challenges that arise in high-dimensional sequential data. The RCE approach was first formalized in "Robust Locally-Linear Controllable Embedding" (Banijamali et al., 2017) and substantially advanced in "Predictive Coding for Locally-Linear Control" (Shu et al., 2020), the latter removing the requirement for high-dimensional reconstruction via decoder networks through an information-theoretic predictive coding bottleneck.

1. Latent Representation and Generative Modeling

RCE models employ an observation-conditional latent embedding ztp(ztxt)z_t \sim p(z_t\mid x_t), where xtx_t is a high-dimensional observation and ztz_t is a tractable, low-dimensional representation. The key innovation is enforcing locally-linear transition dynamics in the latent space: for each (zt,ut)(z_t, u_t), the next-step latent variable z^t+1\hat z_{t+1} is realized by

z^t+1=Atzt+Btut+ct,\hat z_{t+1} = A_t z_t + B_t u_t + c_t,

where AtA_t, BtB_t, ctc_t are determined as functions of a stochastic linearization point (zˉt,uˉt)(\bar z_t, \bar u_t), itself sampled from p(zˉtxt)p(\bar z_t|x_t) and p(uˉtut)p(\bar u_t|u_t). The observation xt+1x_{t+1} is subsequently generated from z^t+1\hat z_{t+1} via a decoder p(xt+1z^t+1)p(x_{t+1}|\hat z_{t+1}) (Banijamali et al., 2017).

In the predictive coding extension (PC3), the decoder can be dispensed with entirely; instead, the model is regularized to maximize the mutual information between next-step and current-step latent codes given the control action, thus enforcing that the latent transition is maximally predictive for control purposes (Shu et al., 2020).

2. Local Linearity and Robust Control

Local linearity is enforced not globally, but in a distributional sense, through randomization of linearization points in the latent space. At each step, AtA_t, BtB_t, and ctc_t are parameterized as functions evaluated at sampled (zˉt,uˉt)(\bar z_t, \bar u_t), leading to a locally-varying but affine mapping ztz^t+1z_t \mapsto \hat z_{t+1}.

Robustness is achieved by treating the linearization points as random variables, ensuring the controller sees an average model over plausible localizations in latent space. In (Banijamali et al., 2017), invertibility of AtA_t is ensured by parameterizing its inverse MtM_t in a rank-one "identity-plus-perturbation" form, further regularizing the model's local geometry.

The predictive coding variant adds a low-curvature penalty: for random perturbations (ηz,ηu)(\eta_z, \eta_u) drawn from an isotropic Gaussian, the deviation of the dynamics fθ(z+ηz,u+ηu)f_\theta(z+\eta_z,u+\eta_u) from its linear approximation around (z,u)(z,u) is penalized, ensuring the latent dynamics remain close to linear for use with LQR/iLQR methods (Shu et al., 2020).

3. Variational Inference and Posterior Approximation

Crucially, the RCE posterior factorizes in a manner enabling tractable amortized inference, conditioning not only on (xt,ut)(x_t, u_t) but, importantly, on xt+1x_{t+1}. For each transition,

q(zt,zˉt,z^t+1xt,xt+1,ut,uˉt)=qϕ(z^t+1xt+1)qφ(zˉtxt,z^t+1)δ(ztMt[z^t+1Btutct]),q(z_t,\bar z_t,\hat z_{t+1}\mid x_t,x_{t+1},u_t,\bar u_t) = q_\phi(\hat z_{t+1}|x_{t+1}) q_\varphi(\bar z_t|x_t,\hat z_{t+1}) \delta(z_t - M_t[\hat z_{t+1} - B_t u_t - c_t]),

where qϕq_\phi and qφq_\varphi are amortized neural encoders, and MtM_t is the inverse of AtA_t as parameterized above (Banijamali et al., 2017). Conditioning inference on xt+1x_{t+1} ("backward encoding") significantly reduces variational approximation error, especially under process noise, compared to prior approaches that only condition on (xt,ut)(x_t, u_t).

The predictive coding model (Shu et al., 2020) dispenses with high-dimensional reconstruction and trains using a contrastive predictive coding (CPC) mutual information lower bound. For KK transitions,

CPC=1Ki=1Klogpθ(z~t+1izti,uti)1Kj=1Kpθ(z~t+1iztj,utj),\ell_{\mathrm{CPC}} = \frac{1}{K}\sum_{i=1}^K \log \frac{p_\theta(\tilde z_{t+1}^i | z_t^i, u_t^i)}{ \frac{1}{K}\sum_{j=1}^K p_\theta(\tilde z_{t+1}^i | z_t^j, u_t^j)},

where z~t+1i\tilde z_{t+1}^i is the noise-perturbed latent encoding of xt+1ix_{t+1}^i.

4. Optimization Objectives

The original RCE (Banijamali et al., 2017) maximizes an evidence lower bound (ELBO) on the predictive log-likelihood of the next observation:

LtRCE=Eqϕ(z^t+1xt+1)[logp(xt+1z^t+1)]Eqϕ(z^t+1xt+1)KL[qφ(zˉtxt,z^t+1)p(zˉtxt)]+H(qϕ(z^t+1xt+1))+Eqϕ,qφ[logp(ztxt)]\mathcal{L}_t^{\mathrm{RCE}} = \mathbb{E}_{q_\phi(\hat z_{t+1}|x_{t+1})}\bigl[\log p(x_{t+1}| \hat z_{t+1})\bigr] - \mathbb{E}_{q_\phi(\hat z_{t+1}|x_{t+1})}\mathrm{KL}\left[q_\varphi(\bar z_t|x_t,\hat z_{t+1}) \| p(\bar z_t|x_t)\right] + H(q_\phi(\hat z_{t+1}|x_{t+1})) + \mathbb{E}_{q_\phi, q_\varphi}[\log p(z_t|x_t)]

This objective jointly regularizes reconstruction, KL divergence, posterior entropy, and transition matching, ensuring that the learned embedding supports locally linear, robust control.

In decoder-free PC3 (Shu et al., 2020), the composite loss is

maxϕ,θ  λ1CPC+λ2E[lnpθ(zt+1zt,ut)]λ3Eηfθ(z+ηz,u+ηu)fθ(z,u)AηzBηu2,\max_{\phi,\theta}\;\lambda_1\,\ell_{\rm CPC}+\lambda_2\,\mathbb{E}\left[\ln p_\theta(z_{t+1}\mid z_t,u_t)\right]-\lambda_3\,\mathbb{E}_\eta \|f_\theta(z+\eta_z,u+\eta_u) - f_\theta(z,u) - A\eta_z - B\eta_u\|^2,

with additional 2\ell_2 regularization and a centering penalty.

5. Algorithmic Workflow

Both RCE (Banijamali et al., 2017) and PC3 (Shu et al., 2020) utilize an end-to-end stochastic gradient pipeline integrating inference, transition modeling, and regularization:

  • Sample mini-batches of observation-action-observation transitions.
  • Encode zt,zˉt,z^t+1z_t, \bar z_t, \hat z_{t+1} via forward and backward encoders.
  • For PC3, noise-perturb the encoded μϕ(ot+1)\mu_\phi(o_{t+1}); for RCE, reconstruct xt+1x_{t+1} from z^t+1\hat z_{t+1}.
  • Compute relevant losses (CPC, ELBO, consistency, curvature).
  • Backpropagate gradients and update all parameters using Adam.

Standard hyperparameters for PC3 include batch size 256, σ=0.1\sigma=0.1, δ=0.01\delta=0.01, loss weights {λ1,λ2,λ3}={1,1,7}\{\lambda_1,\lambda_2,\lambda_3\} = \{1,1,7\}, learning rate 5×1045\times 10^{-4}.

6. Comparative Robustness and Benchmarks

RCE models eliminate key failure modes of earlier embed-to-control (E2C) approaches, including lack of a likelihood-based training objective and variational errors under noise. Conditioning the posterior on xt+1x_{t+1} enables robust encoding even under stochastic dynamics.

PC3 achieves further efficiency and robustness by removing the high-dimensional decoder, thus avoiding overfitting and drastically reducing model parameterization. The mutual information bottleneck ensures retention only of those factors necessary for predictive control, while explicit consistency and curvature regularization prevent representational collapse and enable the use of iLQR and LQR planners.

In empirical evaluations across planar navigation, inverted pendulum swing-up, cartpole, and three-link arm visual domains, RCE outperforms E2C with lower reconstruction and planning costs, and higher goal-reaching rates, especially as process noise is increased. For example, at high process noise (σₙ=5) in planar navigation, RCE achieves approximately 27% lower reconstruction loss and almost twice the success rate as E2C. PC3 further outperforms both PCC and SOLAR, achieving 58.4% time-in-goal on the swing-up pendulum task (versus 26.4% for PCC and 35.4% for SOLAR), and 96.3% on cartpole (compared to 94.4% for PCC and 91.2% for SOLAR). Model training time is significantly reduced: PC3 is approximately 1.9× faster than PCC and 53× faster than SOLAR (Shu et al., 2020, Banijamali et al., 2017).

RCE establishes a generative-modeling pathway for learning control-ready, robustly linear subsets of the observational space, supporting the tractable application of LQR/iLQR. The use of posterior regularization, local random linearization, and predictive coding mutual information objectives represents a consistent progression toward models firmly grounded in the data likelihood, improving sample efficiency, noise robustness, and computational tractability.

A plausible implication is that further advances may generalize the RCE approach, combining efficient decoder-free predictive objectives with advanced variational structures, or extend to broader classes of non-linear control systems with tractable local approximations. The contrastive predictive coding facilitation of controllability in latent space suggests links with recent developments in self-supervised representation learning for control under uncertainty.

Key references: "Robust Locally-Linear Controllable Embedding" (Banijamali et al., 2017); "Predictive Coding for Locally-Linear Control" (Shu et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Robust Locally-Linear Controllable Embedding (RCE).