Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals (2405.16083v1)

Published 25 May 2024 in cs.LG

Abstract: Existing methods for multi-modal time series representation learning aim to disentangle the modality-shared and modality-specific latent variables. Although achieving notable performances on downstream tasks, they usually assume an orthogonal latent space. However, the modality-specific and modality-shared latent variables might be dependent on real-world scenarios. Therefore, we propose a general generation process, where the modality-shared and modality-specific latent variables are dependent, and further develop a \textbf{M}ulti-mod\textbf{A}l \textbf{TE}mporal Disentanglement (\textbf{MATE}) model. Specifically, our \textbf{MATE} model is built on a temporally variational inference architecture with the modality-shared and modality-specific prior networks for the disentanglement of latent variables. Furthermore, we establish identifiability results to show that the extracted representation is disentangled. More specifically, we first achieve the subspace identifiability for modality-shared and modality-specific latent variables by leveraging the pairing of multi-modal data. Then we establish the component-wise identifiability of modality-specific latent variables by employing sufficient changes of historical latent variables. Extensive experimental studies on multi-modal sensors, human activity recognition, and healthcare datasets show a general improvement in different downstream tasks, highlighting the effectiveness of our method in real-world scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024a.
  2. Foundation models for time series analysis: A tutorial and survey. arXiv preprint arXiv:2403.14735, 2024.
  3. Difformer: Multi-resolutional differencing transformer with dynamic ranging for time series analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023a.
  4. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The eleventh international conference on learning representations, 2022.
  5. Moderntcn: A modern pure convolution structure for general time series analysis. In The Twelfth International Conference on Learning Representations, 2024.
  6. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36, 2024.
  7. Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis. arXiv preprint arXiv:2310.06119, 2023.
  8. Mtsa-snn: A multi-modal time series analysis model based on spiking neural network. arXiv preprint arXiv:2402.05423, 2024a.
  9. Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. Advances in Neural Information Processing Systems, 36, 2024b.
  10. Towards multimodal deep learning for activity recognition on mobile devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pages 185–188, 2016.
  11. Indoor localization via multi-modal sensing on smartphones. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 208–219, 2016.
  12. Utilizing expert features for contrastive learning of time-series representations. In International Conference on Machine Learning, pages 16969–16989. PMLR, 2022.
  13. M3care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2418–2428, 2022.
  14. Multi-modality machine learning predicting parkinson’s disease. npj Parkinson’s Disease, 8(1):35, 2022.
  15. Juan Eugenio Iglesias. A ready-to-use machine learning tool for symmetric multi-modality registration of brain mri. Scientific Reports, 13(1):6657, 2023.
  16. Financial time series forecasting with multi-modality graph neural network. Pattern Recognition, 121:108218, 2022.
  17. Domain adaptive multi-modality neural attention network for financial forecasting. In Proceedings of The Web Conference 2020, pages 2230–2240, 2020.
  18. Cocoa: Cross modality contrastive learning for sensor data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(3):1–28, 2022.
  19. Cosmo: contrastive fusion learning with small data for multimodal human activity recognition. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pages 324–337, 2022.
  20. Latent processes identification from multi-view time series. arXiv preprint arXiv:2305.08164, 2023.
  21. Subspace identification for multi-source domain adaptation. Advances in Neural Information Processing Systems, 36, 2024.
  22. Partial disentanglement for domain adaptation. In International conference on machine learning, pages 11455–11472. PMLR, 2022.
  23. Temporally disentangled representation learning. Advances in Neural Information Processing Systems, 35:26492–26503, 2022.
  24. Learning temporally causal latent processes from general temporal data. arXiv preprint arXiv:2110.05428, 2021.
  25. Understanding masked autoencoders via hierarchical latent variable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7918–7928, 2023a.
  26. A survey on behavior recognition using wifi channel state information. IEEE Communications Magazine, 55(10):98–104, 2017.
  27. High-dimensional time series clustering via cross-predictability. In Artificial Intelligence and Statistics, pages 642–651. PMLR, 2017.
  28. Collecting complex activity datasets in highly rich networked sensor environments. In 2010 Seventh international conference on networked sensing systems (INSS), pages 233–240. IEEE, 2010.
  29. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International journal of computer vision, 87(1):4–27, 2010.
  30. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013.
  31. A public domain dataset for human activity recognition using smartphones. In Esann, volume 3, page 3, 2013.
  32. Introducing a new benchmarked dataset for activity monitoring. In 2012 16th international symposium on wearable computers, pages 108–109. IEEE, 2012.
  33. On-body localization of wearable devices: An investigation of position-aware activity recognition. In 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 1–9. IEEE, 2016.
  34. The impact of the mit-bih arrhythmia database. IEEE engineering in medicine and biology magazine, 20(3):45–50, 2001.
  35. The open d1namo dataset: A multi-modal dataset for research on non-invasive type 1 diabetes management. Informatics in Medicine Unlocked, 13:92–100, 2018.
  36. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  37. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  38. Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks. In The World Wide Web Conference, pages 2192–2202, 2019.
  39. Two-stream convolution augmented transformer for human activity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 286–293, 2021a.
  40. Explainable multivariate time series classification: a deep neural network which learns to attend to important variables as well as time intervals. In Proceedings of the 14th ACM international conference on web search and data mining, pages 607–615, 2021.
  41. Units: Short-time fourier inspired neural networks for sensory time series classification. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 234–247, 2021b.
  42. Rf-net: A unified meta-learning framework for rf-enabled one-shot human activity recognition. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, pages 517–530, 2020.
  43. Multimodal deep learning for activity and context recognition. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 1(4):1–27, 2018.
  44. Sensehar: a robust virtual activity sensor for smartphones and wearables. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems, pages 15–28, 2019.
  45. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  46. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  47. Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  48. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8980–8987, 2022.
  49. Mixing up contrastive learning: Self-supervised representation learning for time series. Pattern Recognition Letters, 155:54–61, 2022.
  50. Crossl: Cross-modal self-supervised learning for time-series through latent masking. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 152–160, 2024.
  51. Understanding and constructing latent modality structures in multi-modal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7661–7671, 2023.
  52. High-modality multimodal transformer: Quantifying modality & interaction heterogeneity for high-modality representation learning. Transactions on Machine Learning Research, 2022.
  53. Cross-linked unified embedding for cross-modality representation learning. Advances in Neural Information Processing Systems, 35:15942–15955, 2022.
  54. M3ae: Multimodal representation learning for brain tumor segmentation with missing modalities. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1657–1665, 2023a.
  55. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  56. Multi-modal alignment using representation codebook. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15651–15660, 2022.
  57. Masked vision and language modeling for multi-modal representation learning. arXiv preprint arXiv:2208.02131, 2022.
  58. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021c.
  59. Efficient vision-language pretraining with visual concepts and hierarchical alignment. arXiv preprint arXiv:2208.13628, 2022.
  60. Vision-language pre-training with triple contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15671–15680, 2022.
  61. A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020.
  62. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1476–1485, 2019.
  63. Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021.
  64. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  65. Multimodal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204, 2022.
  66. Learning sequential latent variable models from multimodal time series data. In International Conference on Intelligent Autonomous Systems, pages 511–528. Springer, 2022.
  67. Mvae: Multimodal variational autoencoder for fake news detection. In The world wide web conference, pages 2915–2921, 2019.
  68. Multi-modality spatio-temporal forecasting via self-supervised learning. arXiv preprint arXiv:2405.03255, 2024.
  69. Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing. In Proceedings of the ACM on Web Conference 2024, pages 2795–2806, 2024.
  70. Learning interpretable concepts: Unifying causal representation learning and foundation models. arXiv preprint arXiv:2402.09236, 2024.
  71. Object-centric architectures enable efficient causal representation learning. arXiv preprint arXiv:2310.19054, 2023.
  72. Causal component analysis. Advances in Neural Information Processing Systems, 36, 2024.
  73. Multi-view causal representation learning with partial observability. arXiv preprint arXiv:2311.04056, 2023.
  74. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
  75. Causal triplet: An open challenge for intervention-centric causal representation learning. In 2nd Conference on Causal Learning and Reasoning (CLeaR), 2023b.
  76. The incomplete rosetta stone problem: Identifiability results for multi-view nonlinear ica. In Uncertainty in Artificial Intelligence, pages 217–227. PMLR, 2020.
  77. Pierre Comon. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994.
  78. Aapo Hyvärinen. Independent component analysis: recent advances. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1984):20110534, 2013.
  79. Independent component analysis. Springer, 1998.
  80. Kernel-based nonlinear independent component analysis. In International Conference on Independent Component Analysis and Signal Separation, pages 301–308. Springer, 2007.
  81. On the identifiability of nonlinear ica: Sparsity and beyond. Advances in Neural Information Processing Systems, 35:16411–16422, 2022.
  82. Nonlinear independent component analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
  83. Identifiability of latent-variable and structural-equation models: from linear to nonlinear. arXiv preprint arXiv:2302.02672, 2023.
  84. Ice-beem: Identifiable conditional energy-based deep models based on nonlinear ica. Advances in Neural Information Processing Systems, 33:12768–12778, 2020a.
  85. Identifying semantic component for robust molecular property prediction. arXiv preprint arXiv:2311.04837, 2023b.
  86. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207–2217. PMLR, 2020b.
  87. Unsupervised feature extraction by time-contrastive learning and nonlinear ica. Advances in neural information processing systems, 29, 2016.
  88. Nonlinear ica of temporally dependent stationary sources. In Artificial Intelligence and Statistics, pages 460–469. PMLR, 2017.
  89. Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 859–868. PMLR, 2019.
  90. Multi-domain image generation and translation with identifiability guarantees. In The Eleventh International Conference on Learning Representations, 2022.
  91. Identification of nonlinear latent hierarchical models. arXiv preprint arXiv:2306.07916, 2023b.
  92. Counterfactual generation with identifiability guarantees. In 37th International Conference on Neural Information Processing Systems, NeurIPS 2023, 2023.
  93. Synergies between disentanglement and sparsity: Generalization and identifiability in multi-task learning. In International Conference on Machine Learning, pages 18171–18206. PMLR, 2023.
  94. Partial disentanglement via mechanism sparsity. arXiv preprint arXiv:2207.07732, 2022.
  95. Causal representation learning from multiple distributions: A general setting. arXiv preprint arXiv:2402.05052, 2024b.
  96. Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In Conference on Uncertainty in Artificial Intelligence, pages 939–948. PMLR, 2020.
  97. Citris: Causal identifiability from temporal intervened sequences. In International Conference on Machine Learning, pages 13557–13603. PMLR, 2022.
  98. Temporally disentangled representation learning under unknown nonstationarity. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=V8GHCGYLkf.
  99. Identifiability results for multimodal contrastive learning. arXiv preprint arXiv:2303.09166, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ruichu Cai (68 papers)
  2. Zhifang Jiang (2 papers)
  3. Zijian Li (71 papers)
  4. Weilin Chen (16 papers)
  5. Xuexin Chen (7 papers)
  6. Zhifeng Hao (65 papers)
  7. Yifan Shen (17 papers)
  8. Guangyi Chen (45 papers)
  9. Kun Zhang (353 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets