Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization (2403.00067v4)

Published 29 Feb 2024 in cs.CL

Abstract: This work focuses on the task of query-based meeting summarization in which the summary of a context (meeting transcript) is generated in response to a specific query. When using LLMs for this task, usually a new call to the LLM inference endpoint/API is triggered for each new query, even if the context stays the same. However, repeated calls to the LLM inference endpoints would significantly increase the costs of using them in production, making LLMs impractical for many real-world use cases. To address this problem, in this paper, we investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. In this regard, we conduct extensive experiments by comparing the performance of various popular LLMs: GPT-4, Gemini, Claude-3, LLaMA-2, Mistral, Phi-3, and Qwen-2 in single-query and multi-query settings. We observe that 100% reliability in generating the response in the expected format is usually limited to certain closed-source LLMs, with most open-source LLMs lagging behind (except a few 7B parameters LLMs like Mistral and Phi-3). We conclude that multi-query prompting could be useful to significantly optimize the inference costs in meeting summarization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
  2. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544.
  3. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  5. Google. 2023. Palm 2 technical report. Goole AI.
  6. Mistral 7b. arXiv preprint arXiv:2310.06825.
  7. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  8. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics.
  9. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. In Findings of the Association for Computational Linguistics: ACL 2023, pages 431–469, Toronto, Canada. Association for Computational Linguistics.
  10. Building real-world meeting summarization systems using large language models: A practical perspective. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 343–352, Singapore. Association for Computational Linguistics.
  11. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  12. OpenAI. 2023. Gpt-4 technical report.
  13. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
  14. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  15. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  16. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  17. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  18. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  19. Qmsum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921.
  20. A survey on model compression for large language models. arXiv preprint arXiv:2308.07633.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Md Tahmid Rahman Laskar (30 papers)
  2. Elena Khasanova (4 papers)
  3. Xue-Yong Fu (11 papers)
  4. Cheng Chen (262 papers)
  5. Shashi Bhushan TN (9 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets