Relational Temporal Graph Reasoning for Dual-task Dialogue Language Understanding (2306.09114v1)
Abstract: Dual-task dialog language understanding aims to tackle two correlative dialog language understanding tasks simultaneously via leveraging their inherent correlations. In this paper, we put forward a new framework, whose core is relational temporal graph reasoning.We propose a speaker-aware temporal graph (SATG) and a dual-task relational temporal graph (DRTG) to facilitate relational temporal modeling in dialog understanding and dual-task reasoning. Besides, different from previous works that only achieve implicit semantics-level interactions, we propose to model the explicit dependencies via integrating prediction-level interactions. To implement our framework, we first propose a novel model Dual-tAsk temporal Relational rEcurrent Reasoning network (DARER), which first generates the context-, speaker- and temporal-sensitive utterance representations through relational temporal modeling of SATG, then conducts recurrent dual-task relational temporal graph reasoning on DRTG, in which process the estimated label distributions act as key clues in prediction-level interactions. And the relational temporal modeling in DARER is achieved by relational convolutional networks (RGCNs). Then we further propose Relational Temporal Transformer (ReTeFormer), which achieves fine-grained relational temporal modeling via Relation- and Structure-aware Disentangled Multi-head Attention. Accordingly, we propose DARER with ReTeFormer (DARER2), which adopts two variants of ReTeFormer to achieve the relational temporal modeling of SATG and DTRG, respectively. The extensive experiments on different scenarios verify that our models outperform state-of-the-art models by a large margin. Remarkably, on the dialog sentiment classification task in the Mastodon dataset, DARER and DARER2 gain relative improvements of about 28% and 34% over the previous best model in terms of F1.
- S. Young, M. Gašić, B. Thomson, and J. D. Williams, “Pomdp-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, 2013.
- D. Ghosal, N. Majumder, R. Mihalcea, and S. Poria, “Exploring the role of context in utterance-level emotion, act and intent classification in conversations: An empirical study,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 1435–1449.
- C. Cerisara, S. Jafaritazehjani, A. Oluokun, and H. T. Le, “Multi-task dialog act and sentiment recognition on mastodon.” in COLING, 2018, pp. 745–754.
- L. Qin, W. Che, Y. Li, M. Ni, and T. Liu, “Dcr-net: A deep co-interactive relation network for joint dialog act recognition and sentiment classification,” in AAAI, 2020, pp. 8665–8672.
- M. Kim and H. Kim, “Integrated neural network model for identifying speech acts, predicators, and sentiments of dialogue utterances,” Pattern Recognition Letters, vol. 101, pp. 1–5, 2018.
- J. Li, H. Fei, and D. Ji, “Modeling local contexts for joint dialogue act recognition and sentiment classification with bi-channel dynamic convolutions,” in COLING, 2020, pp. 616–626.
- L. Qin, Z. Li, W. Che, M. Ni, and T. Liu, “Co-gat: A co-interactive graph attention network for joint dialog act recognition and sentiment classification,” in AAAI, vol. 35, no. 15, 2021, pp. 13 709–13 717.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in ICLR, 2018.
- D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. F. Gelbukh, “Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation.” in EMNLP/IJCNLP, 2019, pp. 154–164.
- Y. Wang, J. Zhang, J. Ma, S. Wang, and J. Xiao, “Contextualized emotion recognition in conversation as sequence tagging,” in SIGDIAL, 2020, pp. 186–195.
- Y. Li, H. Su, X. Shen, W. Li, Z. Cao, and S. Niu, “DailyDialog: A manually labelled multi-turn dialogue dataset,” in IJCNLP, 2017, pp. 986–995.
- B. Xing and I. Tsang, “DARER: Dual-task temporal relational recurrent reasoning network for joint dialog sentiment classification and act recognition,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 3611–3621.
- M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” in ESWC, 2018, pp. 593–607.
- D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L.-P. Morency, and R. Zimmermann, “Conversational memory network for emotion recognition in dyadic dialogue videos,” in NAACL, 2018, pp. 2122–2132.
- P. Zhong, D. Wang, and C. Miao, “Knowledge-enriched transformer for emotion detection in textual conversations,” in EMNLP-IJCNLP, 2019, pp. 165–176.
- W. Jiao, M. R. Lyu, and I. King, “Real-time emotion recognition via attention gated hierarchical memory network.” in AAAI, 2020, pp. 8002–8009.
- L. Zhu, G. Pergola, L. Gui, D. Zhou, and Y. He, “Topic-driven and knowledge-aware transformer for dialogue emotion detection,” in ACL-IJCNLP, 2021, pp. 1571–1582.
- W. Shen, S. Wu, Y. Yang, and X. Quan, “Directed acyclic graph network for conversational emotion recognition,” in ACL-IJCNLP, 2021, pp. 1551–1560.
- N. Inui, T. Ebe, B. Indurkhya, and Y. Kotani, “A case-based natural language dialogue system using dialogue act,” in 2001 IEEE International Conference on Systems, Man, and Cybernetics, vol. 1, 2001, pp. 193–198.
- V. Raheja and J. Tetreault, “Dialogue Act Classification with Context-Aware Self-Attention,” in NAACL, 2019, pp. 3727–3733.
- G. Shang, A. Tixier, M. Vazirgiannis, and J.-P. Lorré, “Speaker-change aware CRF for dialogue act classification,” in COLING, 2020, pp. 450–464.
- T. Saha, A. Patra, S. Saha, and P. Bhattacharyya, “Towards emotion-aided multi-modal dialogue act classification,” in ACL, 2020, pp. 4361–4372.
- X. Zhang and H. Wang, “A joint model of intent determination and slot filling for spoken language understanding,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, S. Kambhampati, Ed. IJCAI/AAAI Press, 2016, pp. 2993–2999.
- D. Hakkani-Tür, G. Tur, A. Celikyilmaz, Y.-N. Chen, J. Gao, L. Deng, and Y.-Y. Wang, “Multi-domain joint semantic frame parsing using bi-directional rnn-lstm,” in Interspeech 2016, 2016, pp. 715–719.
- C.-W. Goo, G. Gao, Y.-K. Hsu, C.-L. Huo, T.-C. Chen, K.-W. Hsu, and Y.-N. Chen, “Slot-gated modeling for joint slot filling and intent prediction,” in NAACL, 2018, pp. 753–757.
- C. Li, L. Li, and J. Qi, “A self-attentive model with gate mechanism for spoken language understanding,” in EMNLP, 2018, pp. 3824–3833.
- H. E, P. Niu, Z. Chen, and M. Song, “A novel bi-directional interrelated model for joint intent detection and slot filling,” in ACL, 2019, pp. 5467–5471.
- Y. Liu, F. Meng, J. Zhang, J. Zhou, Y. Chen, and J. Xu, “CM-net: A novel collaborative memory network for spoken language understanding,” in EMNLP-IJCNLP, Hong Kong, China, 2019, pp. 1051–1060.
- L. Qin, W. Che, Y. Li, H. Wen, and T. Liu, “A stack-propagation framework with token-level intent detection for spoken language understanding,” in EMNLP-IJCNLP, 2019, pp. 2078–2087.
- C. Zhang, Y. Li, N. Du, W. Fan, and P. Yu, “Joint slot filling and intent detection via capsule neural networks,” in ACL, 2019, pp. 5259–5267.
- D. Wu, L. Ding, F. Lu, and J. Xie, “SlotRefine: A fast non-autoregressive model for joint intent detection and slot filling,” in EMNLP, 2020, pp. 1932–1937.
- L. Qin, T. Liu, W. Che, B. Kang, S. Zhao, and T. Liu, “A co-interactive transformer for joint slot filling and intent detection,” in ICASSP, 2021, pp. 8193–8197.
- J. Ni, T. Young, V. Pandelea, F. Xue, V. Adiga, and E. Cambria, “Recent advances in deep learning based dialogue systems: A systematic survey,” arXiv:2105.04387, 2021.
- B. Xing and I. Tsang, “Group is better than individual: Exploiting label topologies and label relations for joint multiple intent detection and slot filling,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 3964–3975.
- ——, “Co-guiding net: Achieving mutual guidances between multiple intent detection and slot filling via heterogeneous semantics-label graphs,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 159–169.
- B. Kim, S. Ryu, and G. G. Lee, “Two-stage multi-intent detection for spoken language understanding,” Multimedia Tools and Applications, vol. 76, no. 9, pp. 11 377–11 390, 2017.
- R. Gangadharaiah and B. Narayanaswamy, “Joint multiple intent detection and slot labeling for goal-oriented dialog,” in NAACL, 2019, pp. 564–569.
- L. Qin, X. Xu, W. Che, and T. Liu, “AGIF: An adaptive graph-interactive framework for joint multiple intent detection and slot filling,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1807–1816.
- L. Qin, F. Wei, T. Xie, X. Xu, W. Che, and T. Liu, “GL-GIN: Fast and accurate non-autoregressive model for joint multiple intent detection and slot filling,” in ACL-IJCNLP, 2021, pp. 178–188.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- B. Xing, L. Liao, D. Song, J. Wang, F. Zhang, Z. Wang, and H. Huang, “Earlier attention? aspect-aware lstm for aspect-based sentiment analysis,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 7 2019, pp. 5313–5319.
- S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, “Joint extraction of entities and relations based on a novel tagging scheme,” in ACL, 2017, pp. 1227–1236.
- B. Xing and I. Tsang, “Dignet: Digging clues from local-global interactive graph for aspect-level sentiment classification,” arXiv preprint arXiv:2201.00989, 2022.
- B. Xing and I. W. Tsang, “Understand me, if you refer to aspect knowledge: Knowledge-aware gated recurrent memory network,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 5, pp. 1092–1102, 2022.
- ——, “Out of context: A new clue for context modeling of aspect-based sentiment analysis,” Journal of Artificial Intelligence Research, vol. 74, pp. 627–659, 2022.
- B. Xing and I. Tsang, “Neural subgraph explorer: Reducing noisy information via target-oriented syntax graph pruning,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, 7 2022, pp. 4425–4431, main Track.
- B. Xing and I. W. Tsang, “Co-evolving graph reasoning network for emotion-cause pair extraction,” arXiv preprint arXiv:2306.04340, 2023.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
- G. Ke, D. He, and T.-Y. Liu, “Rethinking positional encoding in language pre-training,” in ICLR, 2021.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2019, pp. 4171–4186.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv:1907.11692, 2019.
- Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing,” in EMNLP: System Demonstrations, 2020, pp. 38–45.
- C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The ATIS spoken language systems pilot corpus,” in Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990.
- A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril, M. Primet, and J. Dureau, “Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces,” arXiv:1805.10190, 2018.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.
- B. Liu and I. Lane, “Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling,” in Interspeech, 2016, pp. 685–689.
- Y. Wang, Y. Shen, and H. Jin, “A bi-model based RNN semantic frame parsing model for intent detection and slot filling,” in NAACL, 2018, pp. 309–314.