Emergent Mind


Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.


  • The paper evaluates the efficiency of fine-tuned small LLMs compared to larger zero-shot LLMs in generating meeting summaries, with a focus on operational costs.

  • FLAN-T5 and other compact models demonstrate performance comparable to or better than larger models, suggesting cost-effective alternatives for industrial applications.

  • The research uses real-world business meeting transcripts and a variant of the QMSUM dataset to ensure robust performance analysis of the LLMs.

  • Incorporating LLMs into real-world settings, the paper considers computational requirements and privacy aspects, favoring smaller, fine-tuned models.

  • The conclusion suggests that fine-tuned small LLMs can rival larger ones in summarization tasks and points to a shift towards economically viable AI deployment.


LLMs have been leaving a significant mark on the AI industry with their ability to handle diverse tasks without the need for task-specific training. While their capabilities are beyond question, deploying them in real-world applications, particularly for generating meeting summaries, often means facing substantial operational costs. This is due to the high computational resources LLMs require. This paper delves into whether more compact versions of these LLMs can provide a cost-effective yet efficient alternative to their larger counterparts for meeting summarization tasks.

Comparative Performance Analysis

The paper presents a meticulous evaluation of smaller, fine-tuned LLMs (including FLAN-T5, TinyLLaMA, and LiteLLaMA) against larger zero-shot LLMs (LLaMA-2, GPT-3.5, PaLM-2). A notable finding is that FLAN-T5, with 780M parameters, demonstrates comparable or even superior performance relative to much larger LLMs operating in a zero-shot capacity. This suggests that leaner models like FLAN-T5 could indeed provide a cost-effective yet effective solution for industrial applications, addressing the issue of prohibitive operational costs associated with the deployment of large models.

Methodological Rigor

To ensure comprehensive and untainted performance analysis, the research employs two datasets. One dataset includes Automatic Speech Recognition transcripts from real-world business meetings, and another is a variant of the QMSUM dataset, with reference summaries re-generated to align with real-world applications. The paper stresses the importance of instruction-following capabilities in LLMs, a critical feature allowing adaptation to varying user demands for summary types—a capability inherent to larger models, but not smaller ones prior to fine-tuning.

Real-world Application Insights

Beyond model performance, the paper offers insights on integrating LLMs into operational settings, considering aspects like computational resource requirements and API costs. The findings showcase that the API usage cost and requisite computing power can dramatically fluctuate based on the chosen LLM. Smaller models like FLAN-T5 emerge as the more resource-efficient options without sacrificing performance quality. Additionally, the paper highlights that privacy concerns are mitigated in the industrial context by the model's design, which segments user data and does not require re-training with new data.


The research concludes by affirming that size isn't the sole determinant of capability in the realm of LLMs. Specially fine-tuned compact LLMs can indeed perform on par with, or even outdo, their more imposing counterparts in certain contexts, providing a more economically viable option for summarizing business meetings. These findings underline a potential paradigm shift towards operational efficiency without compromising the quality in deploying AI solutions. Future exploration is highlighted to further harness smaller LLMs' summarization capabilities for larger datasets and more varied instructions. The paper sets the stage for more sustainable AI practices in a cost-conscious business world.

Get summaries of trending AI/ML papers delivered straight to your inbox

Unsubscribe anytime.