The Short Text Matching Model Enhanced with Knowledge via Contrastive Learning (2304.03898v3)
Abstract: In recent years, short Text Matching tasks have been widely applied in the fields ofadvertising search and recommendation. The difficulty lies in the lack of semantic information and word ambiguity caused by the short length of the text. Previous works have introduced complement sentences or knowledge bases to provide additional feature information. However, these methods have not fully interacted between the original sentence and the complement sentence, and have not considered the noise issue that may arise from the introduction of external knowledge bases. Therefore, this paper proposes a short Text Matching model that combines contrastive learning and external knowledge. The model uses a generative model to generate corresponding complement sentences and uses the contrastive learning method to guide the model to obtain more semantically meaningful encoding of the original sentence. In addition, to avoid noise, we use keywords as the main semantics of the original sentence to retrieve corresponding knowledge words in the knowledge base, and construct a knowledge graph. The graph encoding model is used to integrate the knowledge base information into the model. Our designed model achieves state-of-the-art performance on two publicly available Chinese Text Matching datasets, demonstrating the effectiveness of our model.
- Context enhanced short text matching using clickthrough data. arXiv preprint arXiv:2203.01849, 2022.
- Let: Linguistic knowledge enhanced graph transformer for chinese short text matching. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13498–13506, 2021.
- Ote: An optimized chinese short text matching algorithm based on external knowledge. In Knowledge Science, Engineering and Management: 15th International Conference, KSEM 2022, Singapore, August 6–8, 2022, Proceedings, Part I, pages 15–30. Springer, 2022.
- Hownet-a hybrid language and knowledge resource. In International conference on natural language processing and knowledge engineering, 2003. Proceedings. 2003, pages 820–824. IEEE, 2003.
- Context-aware interaction network for question matching. arXiv preprint arXiv:2104.08451, 2021.
- Enhanced sentence alignment network for efficient short text matching. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 34–40, 2020.
- Adaptive feature discrimination and denoising for asymmetric text matching. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1146–1156, 2022.
- Neural graph matching networks for chinese short text matching. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pages 6152–6158, 2020.
- Enhanced distance-aware self-attention and multi-level match for sentence semantic matching. Neurocomputing, 501:174–187, 2022.
- Multi-granularity interaction model based on pinyins and radicals for chinese semantic matching. World Wide Web, 25(4):1703–1723, 2022.
- Divide and conquer: Text semantic matching with disentangled keywords and intents. arXiv preprint arXiv:2203.02898, 2022.
- Mkpm: Multi keyword-pair matching for natural language sentences. Applied Intelligence, 52(2):1878–1892, 2022.
- Encode, tag, realize: High-precision text editing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5054–5065, 2019.
- Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6382–6388, 2019.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2016.
- Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- The bq corpus: A large-scale domain-specific chinese corpus for sentence semantic equivalence identification. In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 4946–4951, 2018.
- Lcqmc: A large-scale chinese question matching corpus. In Proceedings of the 27th international conference on computational linguistics, pages 1952–1962, 2018.
- Glyce: Glyph-vectors for chinese character representations. Advances in Neural Information Processing Systems, 32, 2019.
- Simcse: Simple contrastive learning of sentence embeddings. In 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 6894–6910. Association for Computational Linguistics (ACL), 2021.
- Ruiqiang Liu (1 paper)
- Qiqiang Zhong (1 paper)
- Mengmeng Cui (4 papers)
- Hanjie Mai (1 paper)
- Qiang Zhang (466 papers)
- Shaohua Xu (1 paper)
- Xiangzheng Liu (4 papers)
- Yanlong Du (8 papers)