Smoothie: Label Free Language Model Routing (2412.04692v1)

Published 6 Dec 2024 in cs.AI and cs.LG

Abstract: LLMs are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs may be good for different input samples. Prior approaches have thus explored how engineers might select an LLM to use for each sample (i.e. routing). While existing routing methods mostly require training auxiliary models on human-annotated data, our work explores whether it is possible to perform unsupervised routing. We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. Given a set of outputs from different LLMs, Smoothie constructs a latent variable graphical model over embedding representations of observable LLM outputs and unknown "true" outputs. Using this graphical model, we estimate sample-dependent quality scores for each LLM, and route each sample to the LLM with the highest corresponding score. We find that Smoothie's LLM quality-scores correlate with ground-truth model quality (correctly identifying the optimal model on 9/14 tasks), and that Smoothie outperforms baselines for routing by up to 10 points accuracy.

Summary

The paper presents a novel label-free routing technique for LLMs by leveraging latent variable graphical models to estimate generation quality.
The paper employs both global and local routing variants, with the local method using nearest neighbors to tailor quality scores and boost accuracy by up to 10 points.
The paper validates SMOOTHIE on multiple benchmarks, achieving a high correlation (ρ = 0.72) with true model performance and outperforming both unsupervised and supervised approaches.

SMOOTHIE: Unlabeled LLM Routing

The paper introduces a novel approach named SMOOTHIE, which addresses the routing of samples to LLMs without the reliance on labeled data. The authors investigate if it is feasible to perform unsupervised routing—assigning tasks to various pre-trained models—through mechanisms inspired by weak supervision.

The inherent challenge in leveraging LLMs lies in their varying capabilities across different tasks. Applications employing LLMs may receive diverse inputs. Thus, selecting an appropriate model for a specific input is crucial for desired performance. Existing approaches predominantly rely on labeled datasets to train auxiliary models for optimal routing, which poses constraints due to the need for extensive annotated data.

The paper addresses this gap by proposing the SMOOTHIE method—a label-free router that eschews the need for specific input labels. It constructs a latent variable graphical model over LLM-generated outputs to estimate the quality of generations. By employing a multivariate Gaussian approach, it derives a closed-form estimator that assesses the quality of LLM outputs based on the proximity of generated embeddings to estimated “true” outputs.

Methodology Breakdown

SMOOTHIE comprises two primary variants: SMOOTHIE-GLOBAL and SMOOTHIE-LOCAL. The former provides quality estimates across all test samples using average embeddings, while the local variant employs nearest neighbors to adjust scores on a per-sample basis, enhancing the estimates' relevance to the specific inputs.

Key aspects include:

LLM Score Estimation: SMOOTHIE calculates LLM quality scores without labeled data by analyzing test sample embeddings. This determination is crucial for routing and computes the most suitable LLM per input based on embedding proximities.
Routing Mechanism: The routing relies on quality estimates, selecting the LLM demonstrating the highest expected performance for each input.

Evaluation and Results

The paper presents experimental validations across several benchmarks, including traditional tasks such as summarization and data-to-text generation, as well as instruction-following tasks. Empirically, SMOOTHIE demonstrates robust performance, often surpassing both unsupervised and supervised routing baselines.

For instance, SMOOTHIE-GLOBAL's rankings align well with true model performance on single-task datasets, exhibiting high correlations (ρ = 0.72). It further identifies top-performing models with 70% accuracy, outperforming random baselines significantly. On mixed-task datasets, SMOOTHIE-LOCAL's sample-conditioned estimates enhanced generation quality, showing accuracy improvements up to 10 points over unsupervised methods and 5 points against supervised counterparts.

Prompt selection capabilities of SMOOTHIE, assessed by applying the quality estimation approach to template selection, show a remarkable performance uplift, even allowing smaller models to match or even exceed the effectiveness of substantially larger ones.

Implications for Future Research

SMOOTHIE's innovations pave the way for further exploration in LLM orchestration without dependency on labeled data. Future research might explore extending the graphical model to consider non-diagonal covariances or balance performance against computational costs more precisely. Additionally, integrating alternative embedding models or additional heuristics could potentially enrich input features.

Overall, the SMOOTHIE framework presents a significant step toward maximizing the utility of diverse LLMs in tasks without the burden of labeled datasets, unlocking potential capabilities in areas of flexible and adaptive LLM deployment.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MayeeChen/status/1866625193596399943

https://twitter.com/NeelGuha/status/1866603882329387170

https://twitter.com/fly51fly/status/1874571163399307764