Identifying Latent Stochastic Differential Equations (2007.06075v5)

Published 12 Jul 2020 in stat.ML and cs.LG

Abstract: We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown It^o process, the proposed method learns the mapping from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of variational autoencoders, we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of latent variable models to show that the proposed model can recover not only the underlying SDE coefficients, but also the original latent variables, up to an isometry, in the limit of infinite data. We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets.

Authors (4)

Ali Hasan (19 papers)
João M. Pereira (16 papers)
Sina Farsiu (18 papers)
Vahid Tarokh (144 papers)

Citations (17)

View on Semantic Scholar

Summary

The paper presents a VAE framework that identifies latent SDE parameters governing continuous-time dynamics in observed sequential data.
It derives identifiability conditions proving that true SDE parameters and decoders are unique up to an orthogonal transformation and translation under no-curl conditions.
The work suggests using AIC and eigenvalue analysis for accurate latent dimension estimation, offering practical insights for model training.

This paper, "Identifying Latent Stochastic Differential Equations" (Hasan et al., 2020 ), introduces a framework for discovering the underlying continuous-time dynamics of observed sequential data by modeling them as Stochastic Differential Equations (SDEs) in a learned latent space. The core idea is to combine a Variational Autoencoder (VAE) structure with an SDE model governing the evolution of latent variables over time. The framework aims to recover not only a meaningful latent representation but also the parameters of the SDE (drift $\mu$ and diffusion $\sigma$ ) and the non-linear mapping (decoder $f$ ) from the latent space to the high-dimensional observation space.

The model consists of a generative process and an inference process. The generative model assumes latent states $Z_t$ follow an SDE $dZ_t = \mu(Z_t, t) dt + \sigma(Z_t, t) dW_t$ . Observations $X_t$ are generated from $Z_t$ via a function $f(Z_t)$ corrupted by additive noise $\epsilon$ , such that $X_t = f(Z_t) + \epsilon_t$ , where $\epsilon_t$ is typically modeled as Gaussian noise $p_\epsilon(X_t | Z_t)$ . The paper focuses on the transition probability $p_\mu(Z_{t+\Delta t}|Z_t)$ approximated using the Euler-Maruyama discretization for a small time step $\Delta t$ , yielding a Gaussian distribution. The overall generative model for a pair of consecutive observations $(X_t, X_{t+\Delta t})$ factors as $p_\theta(X_{t+\Delta t}, X_t, Z_{t+\Delta t}, Z_t) = p_\epsilon(X_{t+\Delta t}|Z_{t+\Delta t}) p_\epsilon(X_t|Z_t) p_\mu(Z_{t+\Delta t}|Z_t) p_\gamma(Z_t)$ , where $p_\gamma(Z_t)$ is a prior distribution on the initial latent state $Z_t$ .

The inference model uses an encoder $\tilde q_\psi(Z_t | X_t)$ to approximate the true posterior distribution of the latent state given the observation. This encoder is parameterized by a neural network that outputs the mean $\bm{\mu}_{\tilde q}(X_t)$ and covariance matrix $\bm{\Sigma}_{\tilde q}(X_t)$ (specifically, its Cholesky decomposition $\bm{L}_{\tilde q}(X_t)$ ) of a Gaussian distribution over $Z_t$ .

The model is trained by maximizing the evidence lower bound (ELBO), which for sequential data involves summing over time steps and approximating expectations. The paper provides a detailed breakdown of the VAE loss $\mathcal{L}(\phi,\psi)$ (Equation 1), showing it decomposes into terms related to the encoder's entropy, the KL divergence between the latent dynamics model and the approximate posterior's transitions, the KL divergence between the approximate posterior and the prior, and the reconstruction likelihood.

Practical calculation of the loss involves specific techniques:

Terms involving $\log \tilde q_\psi(Z_t|X_t)$ are computed exactly due to the Gaussian assumption.
The prior $p_\gamma(Z_t)$ is set to a fixed isotropic Gaussian for empirical effectiveness, allowing its term to be calculated exactly.
The latent dynamics term $-\log p_\mu(Z_{t+\Delta t}|Z_t)$ uses the Euler-Maruyama approximation. Expectations involving the drift $\mu(Z_t)$ are approximated using a first-order Taylor expansion around the mean of the approximate posterior, $\mu(Z_t) \approx \mu(\bm{\mu}_{\tilde q}(X_t))$ .
The decoder terms $-\log p_f(X_t | Z_t)$ rely on the reparametrization trick, assuming the noise $\epsilon$ is Gaussian.

A significant theoretical contribution is the analysis of identifiability. Theorem 1 (Hasan et al., 2020 ) proves that if the SDE's diffusion coefficient $\sigma(z, t)$ satisfies certain "no-curl" conditions (related to $\sigma(y,t)^{-1}$ being the Jacobian of an inverse map or the Hessian of a convex function, which holds for many common SDEs including Brownian motion), then the true latent SDE parameters $(\mu, \sigma, \gamma)$ and the decoder map $f$ are identifiable up to an orthogonal transformation $Q$ and a translation $b$ in the latent space. This means the learned latent representation might be a rotated and shifted version of the true one, but the SDE structure and the non-linear mapping are uniquely determined up to this transformation. This identifiability result relies on techniques involving Fourier transforms and algebraic manipulations. The supplementary material explores identifiability with a learnable diffusion coefficient (Theorem 2), suggesting identifiability up to general affine transformations, but notes that the conditions for this are restrictive and might not hold for simple SDEs.

The paper also addresses the practical problem of determining the correct latent space dimension. Theorem 3 (Hasan et al., 2020 ) suggests using the Akaike Information Criterion (AIC) applied to the likelihood of observed data increments, arguing that the likelihood is maximized when the estimated latent dimension matches the true one. Empirically, the paper shows that analyzing the eigenvalues of the covariance matrix of the estimated latent increments can indicate the intrinsic dimensionality.

Implementation details are provided in the supplementary sections. The encoder and decoder architectures are based on convolutional and fully connected layers (detailed in Table A.1), while the drift coefficient $\mu$ is typically parameterized by a multi-layer perceptron or a simple parametric form (e.g., constant or linear for random walk or OU process). Training is performed using gradient descent with Adam optimizer. Specific hyperparameters for different datasets and SDE types are listed (Tables A.2, A.3). To align the learned latent space with a target space or to compare different runs, the paper describes solving the orthogonal Procrustes problem (Section D.4) to find the optimal orthogonal transformation and translation between sets of latent points. The paper also calculates a Cramér-Rao Lower Bound (CRLB) on the mean squared error for estimating the drift coefficient, focusing on a simplified scenario (constant drift or global shift) to provide a theoretical performance benchmark (Section D.5). Practical considerations regarding training stability, sensitivity to hyperparameters, potential overfitting (illustrated in Figure A.2), and failure cases are discussed, suggesting strategies like hyperparameter tuning and considering regularization. The paper also briefly compares the method to existing video prediction models, highlighting challenges in applying them to the type of data used in this SDE context.

In summary, the paper presents a VAE-based framework for uncovering latent SDEs from sequential observations, provides theoretical guarantees on identifiability under certain conditions, suggests a method for latent dimension estimation, and offers practical guidance and considerations for implementation and training.

PDF Markdown

Identifying Latent Stochastic Differential Equations (2007.06075v5)

Summary

Related Papers