Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Optimizing the Costs of LLM Usage (2402.01742v1)

Published 29 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. GPT-3 API Latency — Model Comparison. https://medium.com/@evyborov/gpt-3-api-latency-model-comparison-13888a834938.
  2. gptcache. https://github.com/zilliztech/GPTCache.
  3. gptrim. https://www.gptrim.com/.
  4. NLTK. https://www.nltk.org/.
  5. OpenAI. https://openai.com/.
  6. OpenAI Pricing. https://openai.com/pricing.
  7. pyspellchecker. https://pypi.org/project/pyspellchecker/.
  8. thesaurus. https://github.com/zaibacu/thesaurus.
  9. Tiktoken. https://github.com/openai/tiktoken.
  10. Ashoori, M. Decoding the true cost of generative ai for your enterprise. https://www.linkedin.com/pulse/decoding-true-cost-generative-ai-your-enterprise-maryam-ashoori-phd/, 2023. [Online; accessed Oct-12-2023].
  11. Ms marco: A human generated machine reading comprehension dataset, 2016.
  12. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, Michigan, June 2005), Association for Computational Linguistics, pp. 65–72.
  13. Frugalml: How to use ml prediction apis more accurately and cheaply, 2020.
  14. Efficient online ml api selection for multi-label classification tasks, 2021.
  15. Frugalgpt: How to use large language models while reducing cost and improving performance. arXiv preprint arXiv:2305.05176 (2023).
  16. The economic potential of generative ai: The next productivity frontier. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#introduction, 2023. [Online; accessed Oct-12-2023].
  17. Efficient unsupervised sentence compression by fine-tuning transformers with reinforcement learning, 2022.
  18. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization.
  19. Cosmos qa: Machine reading comprehension with contextual commonsense reasoning, 2019.
  20. Babybear: Cheap inference triage for expensive language models, 2022.
  21. Neural text generation from structured data with application to the biography domain, 2016.
  22. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (1966), vol. 10, Soviet Union, pp. 707–710.
  23. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  24. Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out (Barcelona, Spain, July 2004), Association for Computational Linguistics, pp. 74–81.
  25. Natural language inference in context – investigating contextual reasoning over long texts, 2020.
  26. Logiqa: A challenge dataset for machine reading comprehension with logical reasoning, 2020.
  27. Tangobert: Reducing inference cost by using cascaded architecture, 2022.
  28. Muss: Multilingual unsupervised sentence simplification by mining paraphrases, 2020.
  29. Controllable sentence simplification, 2020.
  30. fairseq: A fast, extensible toolkit for sequence modeling, 2019.
  31. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA, July 2002), Association for Computational Linguistics, pp. 311–318.
  32. Ranodeb Banerjee, O. Automatic document processing with large language models. https://www.linkedin.com/pulse/automatic-document-processing-large-language-models-ranodeb-banerjee/?utm_source=rss&utm_campaign=articles_sitemaps&utm_medium=google_news, 2023. [Online; accessed Oct-12-2023].
  33. Sallam, R. The economic potential of generative ai: The next productivity frontier. https://www.gartner.com/en/articles/take-this-view-to-assess-roi-for-generative-ai, 2023. [Online; accessed Oct-12-2023].
  34. Shafaq Naz, E. C. Reinventing logistics: Harnessing generative ai and gpt for intelligent document processing. https://www.e2enetworks.com/blog/reinventing-logistics-harnessing-generative-ai-and-gpt-for-intelligent-document-processing, 2023. [Online; accessed Oct-12-2023].
  35. Bigpatent: A large-scale dataset for abstractive and coherent summarization, 2019.
  36. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  37. XtractEdge. Cutting through the noise – how generative ai will change the idp landscape. https://www.edgeverve.com/xtractedge/blogs/transforming-idp-with-generative/, 2023. [Online; accessed Oct-12-2023].
  38. Reclor: A reading comprehension dataset requiring logical reasoning, 2020.
  39. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
  40. Sentence simplification with deep reinforcement learning, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shivanshu Shekhar (4 papers)
  2. Tanishq Dubey (3 papers)
  3. Koyel Mukherjee (15 papers)
  4. Apoorv Saxena (14 papers)
  5. Atharv Tyagi (3 papers)
  6. Nishanth Kotla (1 paper)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com