Towards Identifiability of Hierarchical Temporal Causal Representation Learning

Published 21 Oct 2025 in cs.LG and stat.ME | (2510.18310v1)

Abstract: Modeling hierarchical latent dynamics behind time series data is critical for capturing temporal dependencies across multiple levels of abstraction in real-world tasks. However, existing temporal causal representation learning methods fail to capture such dynamics, as they fail to recover the joint distribution of hierarchical latent variables from \textit{single-timestep observed variables}. Interestingly, we find that the joint distribution of hierarchical latent variables can be uniquely determined using three conditionally independent observations. Building on this insight, we propose a Causally Hierarchical Latent Dynamic (CHiLD) identification framework. Our approach first employs temporal contextual observed variables to identify the joint distribution of multi-layer latent variables. Sequentially, we exploit the natural sparsity of the hierarchical structure among latent variables to identify latent variables within each layer. Guided by the theoretical results, we develop a time series generative model grounded in variational inference. This model incorporates a contextual encoder to reconstruct multi-layer latent variables and normalize flow-based hierarchical prior networks to impose the independent noise condition of hierarchical latent dynamics. Empirical evaluations on both synthetic and real-world datasets validate our theoretical claims and demonstrate the effectiveness of CHiLD in modeling hierarchical latent dynamics.

Abstract PDF Chat (Pro)

Summary

The paper introduces the CHiLD framework, enabling identifiability of hierarchical temporal latent variables using minimal observational data.
It leverages variational inference with temporal convolutional networks and normalizing flows to capture multi-level latent dynamics.
Empirical results demonstrate improved mean correlation and controllable generation in tasks like human motion prediction.

Towards Identifiability of Hierarchical Temporal Causal Representation Learning

The paper presents a novel framework for modeling hierarchical latent dynamics behind time series data. By addressing the challenges in capturing temporal dependencies across multiple levels of abstraction, this research proposes a new approach to causal representation learning. The focus is on achieving identifiability of hierarchical temporal causal representations using minimal observational data.

Introduction

Understanding hierarchical temporal structures in time series data is crucial for capturing latent processes across multiple abstraction levels. Conventional causal representation learning methods often fail to capture such dynamics due to the complexity of recovering the joint distribution of hierarchical latent variables from observed data. This research introduces the Causally Hierarchical Latent Dynamic (CHiLD) framework, which utilizes temporal contextual observations and hierarchical structures for identification.

Hierarchical Data Generation Process

The hierarchical data generation process involves latent variables that evolve across different abstraction levels, influencing each other in a structured manner. The observed variables are generated from these latent variables through a nonlinear mixing function, often contaminating the data with noise. The framework presented in the paper aims to recover the joint distribution of these latent variables, even in the presence of noise and hierarchical complexity.

Theoretical Foundations

The core theoretical insight is that the joint distribution of hierarchical latent variables can be uniquely determined with minimal additional information from temporal contextual observations. The paper introduces a series of theorems and assumptions to establish the identifiability of multi-layer latent variables:

Block-wise Identifiability: The joint distribution of adjacent observed variables, combined with injective linear operators and sufficient variability conditions, ensures the identifiability of hierarchical latent variables.
Component-wise Identifiability: The paper leverages conditional independence assumptions and sufficient variability of latent components to achieve component-wise identifiability within each layer.

The framework quantifies the relationship between the number of observations required and the complexity of hierarchical latent structures, providing a robust theoretical basis for causal representation learning.

CHiLD Framework

The CHiLD framework is implemented using a time series generative model grounded in variational inference. Key components include:

Contextual Encoder: A temporal convolutional network (TCN) processes a sequence of observations to estimate multi-layer latent variables.
Step-wise Decoder: Generates reconstructed observations from the inferred latent space.
Hierarchical Prior Networks: Normalizing flow networks that estimate the prior distribution of latent variables, enforcing the independent noise condition.

The optimization process involves minimizing the Evidence Lower Bound (ELBO) using mean-squared error for reconstruction and Kullback-Leibler divergence to regularize the latent space.

Empirical Evaluation

Empirical experiments on synthetic and real-world datasets validate the framework's effectiveness. The CHiLD framework outperforms existing methods in capturing hierarchical latent dynamics, demonstrating superior mean correlation coefficient (MCC) performance across various benchmarks. The proposed method also excels in tasks requiring controllable generation and hierarchical modeling, such as predicting and generating human motion sequences.

Figure 1: Interpolation visualization of different models. For each method, after training the model, only one latent variable is gradually changed while keeping the other variables fixed. The images of each method from left to right represent the gradual increase of the latent variable.

Conclusion

The CHiLD framework provides a significant advancement in temporal causal representation learning by achieving identifiability in hierarchical latent dynamics. The theoretical contributions and empirical results underscore the framework's potential in diverse real-world applications, from climate modeling to finance and human motion analysis. Future research will explore extending this framework to more complex domains, focusing on relaxation of assumptions and further improving the robustness of the identifiability results.