Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization
Abstract: This paper explores the rapid development of a telephone call summarization system utilizing LLMs. Our approach involves initial experiments with prompting existing LLMs to generate summaries of telephone conversations, followed by the creation of a tailored synthetic training dataset utilizing stronger frontier models. We place special focus on the diversity of the generated data and on the ability to control the length of the generated summaries to meet various use-case specific requirements. The effectiveness of our method is evaluated using two state-of-the-art LLM-as-a-judge-based evaluation techniques to ensure the quality and relevance of the summaries. Our results show that fine-tuned Llama-2-7B-based summarization model performs on-par with GPT-4 in terms of factual accuracy, completeness and conciseness. Our findings demonstrate the potential for quickly bootstrapping a practical and efficient call summarization system.
- OpenAI, “Gpt-4 technical report,” 2023.
- Meta, “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” ArXiv preprint, vol. abs/2307.09288, 2023. [Online]. Available: https://arxiv.org/abs/2307.09288
- B. Chen, Z. Zhang, N. Langrené, and S. Zhu, “Unleashing the potential of prompt engineering in large language models: a comprehensive review,” 2024. [Online]. Available: https://arxiv.org/abs/2310.14735
- D. Thulke, Y. Gao, P. Pelser, R. Brune, R. Jalota, F. Fok, M. Ramos, I. van Wyk, A. Nasir, H. Goldstein, T. Tragemann, K. Nguyen, A. Fowler, A. Stanco, J. Gabriel, J. Taylor, D. Moro, E. Tsymbalov, J. de Waal, E. Matusov, M. Yaghi, M. Shihadah, H. Ney, C. Dugast, J. Dotan, and D. Erasmus, “Climategpt: Towards ai synthesizing interdisciplinary research on climate change,” 2024. [Online]. Available: https://arxiv.org/abs/2401.09646
- A. Köpf, Y. Kilcher, D. von Rütte, S. Anagnostidis, Z. R. Tam, K. Stevens, A. Barhoum, D. M. Nguyen, O. Stanley, R. Nagyfi, S. ES, S. Suri, D. A. Glushkov, A. V. Dantuluri, A. Maguire, C. Schuhmann, H. Nguyen, and A. J. Mattick, “Openassistant conversations - democratizing large language model alignment,” in Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https://openreview.net/forum?id=VSJotgbPHF
- Y. Wang, H. Ivison, P. Dasigi, J. Hessel, T. Khot, K. Chandu, D. Wadden, K. MacMillan, N. A. Smith, I. Beltagy, and H. Hajishirzi, “How far can camels go? exploring the state of instruction tuning on open resources,” in Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https://openreview.net/forum?id=w4zZNC4ZaV
- Y. Wang, H. Li, X. Han, P. Nakov, and T. Baldwin, “Do-not-answer: Evaluating safeguards in LLMs,” in Findings of the Association for Computational Linguistics: EACL 2024, Y. Graham and M. Purver, Eds. St. Julian’s, Malta: Association for Computational Linguistics, Mar. 2024, pp. 896–911. [Online]. Available: https://aclanthology.org/2024.findings-eacl.61
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” ArXiv preprint, vol. abs/2302.13971, 2023. [Online]. Available: https://arxiv.org/abs/2302.13971
- D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro, A. Phanishayee, and M. Zaharia, “Efficient large-scale language model training on gpu clusters using megatron-lm,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’21. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3458817.3476209
- A. H. Cano, M. Pagliardini, A. Köpf, K. Matoba, A. Mohtashami, O. S. Fan, A. Marmet, D. Bayazit, I. Krawczuk, Z. Chen, F. Salvi, A. Bosselut, and M. Jaggi, “epfllm megatron-lm,” 2023. [Online]. Available: https://github.com/epfLLM/Megatron-LLM
- Z. Chen, A. H. Cano, A. Romanou, A. Bonnet, K. Matoba, F. Salvi, M. Pagliardini, S. Fan, A. Köpf, A. Mohtashami, A. Sallinen, A. Sakhaeirad, V. Swamy, I. Krawczuk, D. Bayazit, A. Marmet, S. Montariol, M.-A. Hartley, M. Jaggi, and A. Bosselut, “Meditron-70b: Scaling medical pretraining for large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2311.16079
- C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. YU, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, and O. Levy, “LIMA: Less is more for alignment,” 2023. [Online]. Available: https://openreview.net/forum?id=KBMOKmX2he
- S. Kim, J. Suk, S. Longpre, B. Y. Lin, J. Shin, S. Welleck, G. Neubig, M. Lee, K. Lee, and M. Seo, “Prometheus 2: An open source language model specialized in evaluating other language models,” 2024. [Online]. Available: https://arxiv.org/abs/2405.01535
- H. Song, H. Su, I. Shalyminov, J. Cai, and S. Mansour, “FineSurE: Fine-grained summarization evaluation using LLMs,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 906–922. [Online]. Available: https://aclanthology.org/2024.acl-long.51
- A. Tamura, K. Ishikawa, M. Saikou, and M. Tsuchida, “Extractive summarization method for contact center dialogues based on call logs,” in Proceedings of 5th International Joint Conference on Natural Language Processing, H. Wang and D. Yarowsky, Eds. Chiang Mai, Thailand: Asian Federation of Natural Language Processing, Nov. 2011, pp. 500–508. [Online]. Available: https://aclanthology.org/I11-1056
- B. Favre, E. Stepanov, J. Trione, F. Béchet, and G. Riccardi, “Call centre conversation summarization: A pilot task at multiling 2015,” in Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, A. Koller, G. Skantze, F. Jurcicek, M. Araki, and C. P. Rose, Eds. Prague, Czech Republic: Association for Computational Linguistics, Sep. 2015, pp. 232–236. [Online]. Available: https://aclanthology.org/W15-4633
- T. Goyal, J. J. Li, and G. Durrett, “News summarization and evaluation in the era of gpt-3,” 2022. [Online]. Available: https://arxiv.org/abs/2209.12356
- T. Zhang, F. Ladhak, E. Durmus, P. Liang, K. McKeown, and T. B. Hashimoto, “Benchmarking large language models for news summarization,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 39–57, 2024. [Online]. Available: https://aclanthology.org/2024.tacl-1.3
- S. Ramprasad, E. Ferracane, and Z. Lipton, “Analyzing LLM behavior in dialogue summarization: Unveiling circumstantial hallucination trends,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 12 549–12 561. [Online]. Available: https://aclanthology.org/2024.acl-long.677
- Y. Chen, Y. Liu, L. Chen, and Y. Zhang, “DialogSum: A real-life scenario dialogue summarization dataset,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Online: Association for Computational Linguistics, Aug. 2021, pp. 5062–5074. [Online]. Available: https://aclanthology.org/2021.findings-acl.449
- A. Asi, S. Wang, R. Eisenstadt, D. Geckt, Y. Kuper, Y. Mao, and R. Ronen, “An end-to-end dialogue summarization system for sales calls,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, A. Loukina, R. Gangadharaiah, and B. Min, Eds. Hybrid: Seattle, Washington + Online: Association for Computational Linguistics, Jul. 2022, pp. 45–53. [Online]. Available: https://aclanthology.org/2022.naacl-industry.6
- A. Mullick, S. Bose, R. Saha, A. Bhowmick, P. Goyal, N. Ganguly, P. Dey, and R. Kokku, “On the persona-based summarization of domain-specific documents,” in Findings of the Association for Computational Linguistics ACL 2024, L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand and virtual meeting: Association for Computational Linguistics, Aug. 2024, pp. 14 291–14 307. [Online]. Available: https://aclanthology.org/2024.findings-acl.849
- A. Fan, D. Grangier, and M. Auli, “Controllable abstractive summarization,” in Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, A. Birch, A. Finch, T. Luong, G. Neubig, and Y. Oda, Eds. Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 45–54. [Online]. Available: https://aclanthology.org/W18-2706
- R. Jie, X. Meng, L. Shang, X. Jiang, and Q. Liu, “Prompt-based length controlled generation with multiple control types,” in Findings of the Association for Computational Linguistics ACL 2024, L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand and virtual meeting: Association for Computational Linguistics, Aug. 2024, pp. 1067–1085. [Online]. Available: https://aclanthology.org/2024.findings-acl.63
- W. Yuan, I. Kulikov, P. Yu, K. Cho, S. Sukhbaatar, J. Weston, and J. Xu, “Following length constraints in instructions,” 2024. [Online]. Available: https://arxiv.org/abs/2406.17744
- S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large language model for finance,” 2023. [Online]. Available: https://arxiv.org/abs/2303.17564
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.