Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data (2110.08021v2)

Published 15 Oct 2021 in cs.LG, cs.CL, and cs.MM

Abstract: The increasing complexity of Industry 4.0 systems brings new challenges regarding predictive maintenance tasks such as fault detection and diagnosis. A corresponding and realistic setting includes multi-source data streams from different modalities, such as sensors measurements time series, machine images, textual maintenance reports, etc. These heterogeneous multimodal streams also differ in their acquisition frequency, may embed temporally unaligned information and can be arbitrarily long, depending on the considered system and task. Whereas multimodal fusion has been largely studied in a static setting, to the best of our knowledge, there exists no previous work considering arbitrarily long multimodal streams alongside with related tasks such as prediction across time. Thus, in this paper, we first formalize this paradigm of heterogeneous multimodal learning in a streaming setting as a new one. To tackle this challenge, we propose StreaMulT, a Streaming Multimodal Transformer relying on cross-modal attention and on a memory bank to process arbitrarily long input sequences at training time and run in a streaming way at inference. StreaMulT improves the state-of-the-art metrics on CMU-MOSEI dataset for Multimodal Sentiment Analysis task, while being able to deal with much longer inputs than other multimodal models. The conducted experiments eventually highlight the importance of the textual embedding layer, questioning recent improvements in Multimodal Sentiment Analysis benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. “Multimodal machine learning: A survey and taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, 05 2017.
  2. “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. 2017, vol. 30, Curran Associates, Inc.
  3. “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2019.
  4. “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, pp. 9, 2019.
  5. “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  6. “Conformer: Convolution-augmented transformer for speech recognition,” in Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, Helen Meng, Bo Xu, and Thomas Fang Zheng, Eds. 2020, pp. 5036–5040, ISCA.
  7. “Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018. 2018, pp. 5884–5888, IEEE.
  8. “Multimodal learning with transformers: A survey,” 2022.
  9. “Multimodal transformer for unaligned multimodal language sequences,” ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 6558–6569, 2020.
  10. “Efficient transformers: A survey,” ACM Comput. Surv., apr 2022.
  11. “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. 2020, vol. 119 of Proceedings of Machine Learning Research, pp. 5156–5165, PMLR.
  12. “Rethinking attention with performers,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  13. “Generating long sequences with sparse transformers,” CoRR, vol. abs/1904.10509, 2019.
  14. “Big bird: Transformers for longer sequences,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, Eds., 2020.
  15. “Longformer: The long-document transformer,” CoRR, vol. abs/2004.05150, 2020.
  16. “Efficient content-based sparse attention with routing transformers,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 53–68, 2021.
  17. “Transformer-XL: Attentive language models beyond a fixed-length context,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 2019, pp. 2978–2988, Association for Computational Linguistics.
  18. “Compressive transformers for long-range sequence modelling,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
  19. “∞\infty∞-former: Infinite memory transformer,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, May 2022, pp. 5468–5485, Association for Computational Linguistics.
  20. “Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory,” in Proc. Interspeech 2020, 2020, pp. 2132–2136.
  21. “Synchronous transformers for end-to-end speech recognition,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 7884–7888.
  22. “Self-attention aligner: A latency-control end-to-end model for asr using self-attention network and chunk-hopping,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 5656–5660.
  23. “Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6783–6787.
  24. Venkat Venkatasubramanian et al., “A review of process fault detection and diagnosis. part i: Quantitative model-based methods 27(3), 293–311. part ii: Qualitative models and search strategies 27(3), 313–32. part iii: Process history based methods 27(3), 327–346,” Computers & Chemical Engineering, 2003.
  25. Norazwan Nor et al., “A review of data-driven fault detection and diagnosis methods: Applications in chemical process systems,” Reviews in Chemical Engineering, vol. 36, 2019.
  26. Shen Zhang et al., “Deep learning algorithms for bearing fault diagnostics—a comprehensive review,” IEEE Access, vol. 8, pp. 29857–29881, 2020.
  27. A.P. Rogers et al., “A review of fault detection and diagnosis methods for residential air conditioning systems,” Building and Environment, vol. 161, pp. 106236, 2019.
  28. Vasile Palade et al., Computational Intelligence in Fault Diagnosis, 2006.
  29. Angelos Angelopoulos et al., “Tackling faults in the industry 4.0 era—a survey of machine-learning solutions and key aspects,” Sensors, vol. 20, no. 1, pp. 109, 2019.
  30. “Industrial process monitoring in the big data/industry 4.0 era: from detection, to diagnosis, to prognosis,” Processes, vol. 5, no. 3, 2017.
  31. Zhe Li, “Deep learning driven approaches for predictive maintenance: A framework of intelligent fault diagnosis and prognosis in the industry 4.0 era,” 2018.
  32. Bo Luo et al., “Early fault detection of machine tools based on deep learning and dynamic identification,” IEEE TIE, vol. 66, no. 1, pp. 509–518, 2019.
  33. Long Wen et al., “A new snapshot ensemble convolutional neural network for fault diagnosis,” IEEE Access, vol. 7, pp. 32037–32047, 2019.
  34. Jafar Zarei et al., “Vibration analysis for bearing fault detection and classification using an intelligent filter,” Mechatronics, vol. 24, 2014.
  35. Yukun Liu et al., “Application to induction motor faults diagnosis of the amplitude recovery method combined with fft,” Mechanical Systems and Signal Processing, vol. 24, pp. 2961–2971, 2010.
  36. Gulshan Taneja et al., “Reliability modelling and analysis of a single machine subsystem of a cable plant,” 2017.
  37. Zhenyou Zhang et al., “Fault diagnosis and prognosis using wavelet packet decomposition, fourier transform and artificial neural network,” Journal of Intelligent Manufacturing, vol. 24, 2013.
  38. Pratyay Konar et al., “Bearing fault detection of induction motor using wavelet and neural networks.,” 2009, pp. 798–809.
  39. Bo-Suk Yang et al., “Random forests classifier for machine fault diagnosis,” JMST, vol. 22, pp. 1716–1725, 2008.
  40. Raed Jafar et al., “Application of artificial neural networks (ann) to model the failure of urban water mains,” Mathematical and Computer Modelling, vol. 51, pp. 1170–1180, 2010.
  41. R. Yam et al., “Intelligent predictive decision support system for condition-based maintenance,” IJAMT, vol. 17, pp. 383–391, 2001.
  42. “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,” PloS one, vol. 11, pp. e0152173, 2016.
  43. Javier Diaz Rozo et al., “Machine learning-based cps for clustering high throughput machining cycle conditions,” Procedia Manufacturing, vol. 10, pp. 997–1008, 2017.
  44. Ying Peng et al., “Current status of machine prognostics in condition-based maintenance: A review,” International Journal of Advanced Manufacturing Technology, vol. 50, pp. 297–313, 2010.
  45. Y. Bengio et al., “Representation learning: A review and new perspectives,” IEEE TPAMI, vol. 35, pp. 1798–1828, 2013.
  46. Yann LeCun et al., “Deep learning,” Nature, vol. 521, pp. 436–44, 2015.
  47. Min Xia et al., “Fault diagnosis for rotating machinery using multiple sensors and convolutional neural networks,” IEEE/ASME Transactions on Mechatronics, vol. PP, pp. 1–1, 2017.
  48. Long Wen et al., “A new convolutional neural network based data-driven fault diagnosis method,” IEEE TIE, vol. PP, pp. 1–1, 2017.
  49. Jun Pan et al., “Liftingnet: A novel deep learning network with layerwise feature learning from noisy mechanical data for fault classification,” IEEE TIE, vol. PP, pp. 1–1, 2017.
  50. Wathiq Abed, “A robust bearing fault detection and diagnosis technique for brushless dc motors under non-stationary operating conditions,” JCAES, 2015.
  51. Liang Guo et al., “A recurrent neural network based health indicator for remaining useful life prediction of bearings,” Neurocomputing, vol. 240, 2017.
  52. Bingjie Wu et al., “Simultaneous-fault diagnosis considering time series with a deep learning transformer architecture for air handling units,” Energy and Buildings, vol. 257, pp. 111608, 2021.
  53. Kun Yu et al., “A bearing fault and severity diagnostic technique using adaptive deep belief networks and dempster–shafer theory,” Structural Health Monitoring, 2019.
  54. Tianchen Liang et al., “Bearing fault diagnosis based on improved ensemble learning and deep belief network,” Journal of Physics, vol. 1074, pp. 012154, 2018.
  55. Feng Jia et al., “Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data,” Mechanical Systems and Signal Processing, vol. 72-73, 2015.
  56. Jiedi Sun et al., “Intelligent bearing fault diagnosis method combining compressed data acquisition and deep learning,” IEEE TIM, vol. PP, pp. 1–11, 2017.
  57. Haidong Shao et al., “A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders,” MSSP, vol. 102, pp. 278–297, 2018.
  58. “Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks,” 2018, pp. 6017–6022.
  59. Han Liu et al., “Unsupervised fault diagnosis of rolling bearings using a deep neural network based on generative adversarial networks,” Neurocomputing, vol. 315, 2018.
  60. Anurag Choudhary et al., “Bearing fault diagnosis of induction motor using thermal imaging,” 2018, pp. 950–955.
  61. Olivier Janssens et al., “Thermal image based fault diagnosis for rotating machinery,” Infrared Physics and Technology, vol. 73, pp. 78–87, 2015.
  62. Amin Taheri-Garavand et al., “An intelligent approach for cooling radiator fault diagnosis based on infrared thermal image processing technique,” Applied Thermal Engineering, vol. 87, pp. 434–443, 2015.
  63. Alistair Reid et al., “Fault location and diagnosis in a medium voltage epr power cable,” IEEE TDEI, vol. 20, pp. 10 – 18, 2013.
  64. Sen Wang et al., “Panoramic crack detection for steel beam based on structured random forests,” IEEE Access, vol. 6, pp. 16432–16444, 2018.
  65. Jinjiang Wang et al., “Machine vision intelligence for product defect inspection based on deep learning and hough transform,” Journal of Manufacturing Systems, vol. 51, pp. 52–60, 2019.
  66. Feng Wang et al., “Bilevel feature extraction-based text mining for fault diagnosis of railway systems,” IEEE TITS, vol. 18, no. 1, pp. 49–58, 2016.
  67. Ruben Sipos et al., “Log-based predictive maintenance,” in ACM SIGKDD 2014, p. 1867–1876.
  68. John Sipple, “Interpretable, multidimensional, multimodal anomaly detection with negative sampling for detection of device failure,” in ICML 2020. 2020, vol. 119, pp. 9016–9025, PMLR.
  69. Funa Zhou et al., “A multimodal feature fusion-based deep learning method for online fault diagnosis of rotating machinery,” Sensors, vol. 18, pp. 3521, 2018.
  70. Tauheed Mian et al., “A sensor fusion based approach for bearing fault diagnosis of rotating machine,” Journal of Risk and Reliability, vol. 0, no. 0, pp. 1748006X211044843, 0.
  71. “Deep multimodal representation learning: A survey,” IEEE Access, vol. 7, pp. 63373–63394, 2019.
  72. “Streaming simultaneous speech translation with augmented memory transformer,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7523–7527.
  73. “Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, July 2018, pp. 2236–2246, Association for Computational Linguistics.
  74. “The computer expression recognition toolbox (cert),” in 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), 2011, pp. 298–305.
  75. “Covarep—a collaborative voice analysis repository for speech technologies,” in 2014 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2014, pp. 960–964.
  76. “GloVe: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532–1543, Association for Computational Linguistics.
  77. “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, pp. 3878–3878, 2008.
  78. “Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10790–10797, May 2021.
  79. “Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, Eds. 2021, pp. 9180–9192, Association for Computational Linguistics.
  80. “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 2020, pp. 7871–7880, Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Victor Pellegrain (2 papers)
  2. Myriam Tami (18 papers)
  3. Michel Batteux (1 paper)
  4. Céline Hudelot (50 papers)
Citations (2)