Uncovering Customer Issues through Topological Natural Language Analysis (2403.00804v1)
Abstract: E-commerce companies deal with a high volume of customer service requests daily. While a simple annotation system is often used to summarize the topics of customer contacts, thoroughly exploring each specific issue can be challenging. This presents a critical concern, especially during an emerging outbreak where companies must quickly identify and address specific issues. To tackle this challenge, we propose a novel machine learning algorithm that leverages natural language techniques and topological data analysis to monitor emerging and trending customer issues. Our approach involves an end-to-end deep learning framework that simultaneously tags the primary question sentence of each customer's transcript and generates sentence embedding vectors. We then whiten the embedding vectors and use them to construct an undirected graph. From there, we define trending and emerging issues based on the topological properties of each transcript. We have validated our results through various methods and found that they are highly consistent with news sources.
- A unified architecture for natural language processing: deep neural networks with multitask learning. ICML ’08: Proceedings of the 25th international conference on Machine learning, 81:160–167.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv, page 0909.4061.
- On the sentence embeddings from pre-trained language models. arXiv, page 2011.05864.
- Neural relation extraction with selective attention over instances. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1:2124–2133.
- Efficient estimation of word representations in vector space. arXiv, page 1301.3781.
- Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv, page 1602.06023.
- Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics, 11:378.
- Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 1532–1543.
- Sentencebert: Sentence embeddings using siamese bertnetworks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, page 3982–3992.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv, (1910):01108.
- Learning character-level representations for part-of-speech tagging. Proceedings of the 31st International Conference on Machine Learning, PMLR, 32:1818–1826.
- Whitening sentence representations for better semantics and faster retrieval. arXiv, page 2103.15316.
- van Steen, M. (2010). An introduction to graph theory and complex networks.
- Attention is all you need. Advances in neural information processing systems, 30.
- Gmc: Graph-based multi-view clustering. IEEE Transactions on Knowledge and Data Engineering, 32:1116–1129.
- Hierarchical attention networks for document classification. Proceedings of NAACL-HLT 2016, pages 1480–1489.