Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process (2401.14535v2)

Published 25 Jan 2024 in cs.LG, cs.CV, and stat.ME

Abstract: Identifying the underlying time-delayed latent causal processes in sequential data is vital for grasping temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy in real-world applications containing information loss. For instance, the visual perception process translates a 3D space into 2D images, or the phenomenon of persistence of vision incorporates historical data into current perceptions. To address this challenge, we establish an identifiability theory that allows for the recovery of independent latent components even when they come from a nonlinear and non-invertible mix. Using this theory as a foundation, we propose a principled approach, CaRiNG, to learn the CAusal RepresentatIon of Non-invertible Generative temporal data with identifiability guarantees. Specifically, we utilize temporal context to recover lost latent information and apply the conditions in our theory to guide the training process. Through experiments conducted on synthetic datasets, we validate that our CaRiNG method reliably identifies the causal process, even when the generation process is non-invertible. Moreover, we demonstrate that our approach considerably improves temporal understanding and reasoning in practical applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Multifactor sequential disentanglement via structured koopman autoencoders. In The Eleventh International Conference on Learning Representations, 2022.
  2. Causality: Statistical perspectives and applications. John Wiley & Sons, 2012.
  3. Triad constraints for learning causal structure of latent variables. Advances in neural information processing systems, 2019.
  4. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  6299–6308, 2017.
  5. Learning latent tree graphical models. Journal of Machine Learning Research, 12:1771–1812, 2011.
  6. A recurrent latent variable model for sequential data. Advances in neural information processing systems, 28, 2015.
  7. Coltheart, M. The persistences of vision. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 290(1038):57–69, 1980.
  8. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
  9. Marginal likelihood and model selection for gaussian latent tree and forest models. 2017.
  10. Sequential neural models with stochastic layers, 2016.
  11. Friston, K. Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS biology, 7(2):e1000033, 2009.
  12. Testing for granger causality with mixed frequency data. Journal of Econometrics, 192(1):207–230, 2016.
  13. Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In Conference on Uncertainty in Artificial Intelligence, pp. 939–948. PMLR, 2020.
  14. Disentangling identifiable features from noisy data with structured nonlinear ica. arXiv preprint arXiv:2106.09620, 2021.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  16. beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
  17. Latent hierarchical causal structure discovery with rank constraints. arXiv preprint arXiv:2210.01798, 2022.
  18. Unsupervised feature extraction by time-contrastive learning and nonlinear ica. Advances in Neural Information Processing Systems, 29:3765–3773, 2016.
  19. Nonlinear ica of temporally dependent stationary sources. In Artificial Intelligence and Statistics, pp.  460–469. PMLR, 2017.
  20. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
  21. Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  859–868. PMLR, 2019.
  22. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pp.  2207–2217. PMLR, 2020.
  23. Video question answering using language-guided deep compressed-domain video feature. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1708–1717, 2021.
  24. Towards nonlinear disentanglement in natural data with temporal sparse coding. arXiv preprint arXiv:2007.10930, 2020.
  25. Causal clustering for 1-factor measurement models. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1655–1664, 2016.
  26. Disentanglement via mechanism sparsity regularization: A new principle for nonlinear ica. In Conference on Causal Learning and Reasoning, pp. 428–484. PMLR, 2022.
  27. Additive decoders for latent variables identification and cartesian-product extrapolation, 2023.
  28. Hierarchical conditional relation networks for video question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9972–9981, 2020.
  29. Disentangled sequential autoencoder. arXiv preprint arXiv:1803.02991, 2018.
  30. icitris: Causal representation learning for instantaneous temporal effects. arXiv preprint arXiv:2206.06169, 2022a.
  31. Citris: Causal identifiability from temporal intervened sequences. In International Conference on Machine Learning, pp. 13557–13603. PMLR, 2022b.
  32. Cross-modal causal relational reasoning for event-level visual question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  33. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  34. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12009–12019, 2022.
  35. Palmer, S. E. Vision science: Photons to phenomenology. MIT press, 1999.
  36. Pearl, J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, 1988.
  37. Variational inference with normalizing flows. In International conference on machine learning, pp. 1530–1538. PMLR, 2015.
  38. Attend what you need: Motion-appearance synergistic networks for video question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp.  6167–6177, 2021.
  39. Estimation of linear non-gaussian acyclic models for latent factors. Neurocomputing, 72(7-9):2024–2027, 2009.
  40. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7(2), 2006.
  41. Disentanglement by nonlinear ica with general incompressible-flow networks (gin). arXiv preprint arXiv:2001.04872, 2020.
  42. Spearman, C. Pearson’s contribution to the theory of two factors. British Journal of Psychology, 19(1):95, 1928.
  43. Spelke, M. Principles of object perception, cognitive science 14. 1990.
  44. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  45. Dualvgr: A dual-visual graph reasoning unit for video question answering. IEEE Transactions on Multimedia, 2021.
  46. Werbos, P. Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Committee on Applied Mathematics, Harvard University, Cambridge, MA, 1974.
  47. Wertheimer, M. Laws of organization in perceptual forms. 1938.
  48. Wolford, G. A model of visible persistence based on linear systems. Canadian Psychology/Psychologie canadienne, 34(2):162, 1993.
  49. Generalized independent noise condition for estimating latent variable causal graphs. arXiv preprint arXiv:2010.04917, 2020.
  50. Identification of linear non-gaussian latent hierarchical structure. In International Conference on Machine Learning, pp. 24370–24387. PMLR, 2022.
  51. Sutd-trafficqa: A question answering benchmark and an efficient network for video reasoning over traffic events. In CVPR, pp.  9878–9888, 2021.
  52. Temporally disentangled representation learning. In Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=Vi-sZWNA_Ue.
  53. Learning temporally causal latent processes from general temporal data. In International Conference on Learning Representations, 2022b. URL https://openreview.net/forum?id=RDlLMjLJXdq.
  54. Zhang, N. L. Hierarchical latent class models for cluster analysis. The Journal of Machine Learning Research, 5:697–723, 2004.
  55. Latent normalizing flows for discrete sequences. In International Conference on Machine Learning, pp. 7673–7682. PMLR, 2019.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets