Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? (2402.00841v2)

Published 1 Feb 2024 in cs.CL

Abstract: LLMs have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.

Citations (15)

View on Semantic Scholar

Summary

The paper demonstrates that fine-tuned compact LLMs, notably FLAN-T5 with 780M parameters, deliver comparable performance to larger zero-shot models in meeting summarization.
It employs rigorous evaluations using real business meeting transcripts and a re-generated QMSUM dataset to benchmark both performance and cost-effectiveness.
The study highlights that smaller models significantly reduce computational costs and privacy risks, paving the way for sustainable AI practices in business applications.

Introduction

LLMs have been leaving a significant mark on the AI industry with their ability to handle diverse tasks without the need for task-specific training. While their capabilities are beyond question, deploying them in real-world applications, particularly for generating meeting summaries, often means facing substantial operational costs. This is due to the high computational resources LLMs require. This paper explores whether more compact versions of these LLMs can provide a cost-effective yet efficient alternative to their larger counterparts for meeting summarization tasks.

Comparative Performance Analysis

The paper presents a meticulous evaluation of smaller, fine-tuned LLMs (including FLAN-T5, TinyLLaMA, and LiteLLaMA) against larger zero-shot LLMs (LLaMA-2, GPT-3.5, PaLM-2). A notable finding is that FLAN-T5, with 780M parameters, demonstrates comparable or even superior performance relative to much larger LLMs operating in a zero-shot capacity. This suggests that leaner models like FLAN-T5 could indeed provide a cost-effective yet effective solution for industrial applications, addressing the issue of prohibitive operational costs associated with the deployment of large models.

Methodological Rigor

To ensure comprehensive and untainted performance analysis, the research employs two datasets. One dataset includes Automatic Speech Recognition transcripts from real-world business meetings, and another is a variant of the QMSUM dataset, with reference summaries re-generated to align with real-world applications. The paper stresses the importance of instruction-following capabilities in LLMs, a critical feature allowing adaptation to varying user demands for summary types—a capability inherent to larger models, but not smaller ones prior to fine-tuning.

Real-world Application Insights

Beyond model performance, the paper offers insights on integrating LLMs into operational settings, considering aspects like computational resource requirements and API costs. The findings showcase that the API usage cost and requisite computing power can dramatically fluctuate based on the chosen LLM. Smaller models like FLAN-T5 emerge as the more resource-efficient options without sacrificing performance quality. Additionally, the paper highlights that privacy concerns are mitigated in the industrial context by the model's design, which segments user data and does not require re-training with new data.

Conclusion

The research concludes by affirming that size isn't the sole determinant of capability in the field of LLMs. Specially fine-tuned compact LLMs can indeed perform on par with, or even outdo, their more imposing counterparts in certain contexts, providing a more economically viable option for summarizing business meetings. These findings underline a potential paradigm shift towards operational efficiency without compromising the quality in deploying AI solutions. Future exploration is highlighted to further harness smaller LLMs' summarization capabilities for larger datasets and more varied instructions. The paper sets the stage for more sustainable AI practices in a cost-conscious business world.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rasbt/status/1756316089393270853

https://twitter.com/rasbt/status/1754516687896887449

https://twitter.com/gm8xx8/status/1753239540527956302

https://twitter.com/knishimae0531/status/1756470162210132149

https://twitter.com/knishimae0531/status/1754656313270022635

HackerNews

Tiny Titans: Can Smaller LLMs Punch Above Their Weight? (1 point, 0 comments)