Cosmos-Predict2: Predictive LLM Adaptation

Updated 23 January 2026

Cosmos-Predict2 is an information-theoretic framework that formalizes joint model and strategy selection for adapting large language models under compute constraints.
It employs tailored predictive models for both fine-tuning (QLoRA) and in-context learning to efficiently estimate performance and cost without exhaustive grid search.
Empirical benchmarks show it achieves up to 99.3% oracle accuracy at significantly lower costs, enabling scalable and resource-aware LLM deployment.

Cosmos-Predict2 refers to the information-theoretic and computational framework within the COSMOS methodology for predictable and cost-effective adaptation of LLMs. It formalizes, and solves efficiently, the challenge of selecting both a model and an adaptation strategy—such as fine-tuning or in-context learning—under explicit compute and deployment constraints. Cosmos-Predict2 encompasses formal problem setup, predictive model design (for both performance and cost), analytic cost modeling, results benchmarking, and a roadmap for extensions to unified, strategy-agnostic LLM adaptation selection (Wang et al., 30 Apr 2025).

1. Formalization of the Joint Model and Strategy Selection Problem

At the heart of Cosmos-Predict2 is a formal mathematical framework that integrates multiple LLMs, adaptation strategies, and extensive configuration spaces. Denote the model pool as $\mathcal{F} = \{f_1, \dots, f_K\}$ , the adaptation-strategy pool as $\mathcal{T} = \{T_1, \dots, T_J\}$ , and hyperparameter configuration space for each $T_j$ as $\Omega$ . The downstream performance metric is $\pi$ , with associated adaptation cost function $c$ . For strategy $T_j$ with config $\omega$ applied to model $f_k$ , one observes $\pi(T_j^\omega(f_k))$ and $c(T_j^\omega(f_k))$ .

The core selection operator for downstream task $D$ is the indicator

$M_D(\mathcal{F}, \mathcal{T}, \Omega) = \arg\max_{f_k \in \mathcal{F}, T_j \in \mathcal{T}, \omega \in \Omega} s\big(\pi(T_j^\omega(f_k)), c(T_j^\omega(f_k))\big)$

where $s:\mathbb{R} \times \mathbb{R}_+ \to \mathbb{R}$ is a user-defined score reflecting the value trade-off (e.g., $s(\pi, c) = \pi - \epsilon c/c_{\max}$ ). The computational cost of exhaustive evaluation is the sum $\sum_{j,k,\omega} c(T_j^\omega, f_k)$ . Cosmos-Predict2 instead proposes to learn predictors $P_{j,k}(\omega) \approx \pi(T_j^\omega(f_k))$ and $C_{j,k}(\omega) \approx c(T_j^\omega(f_k))$ such that

$\sum_{j,k} c_{\text{predict}}(P_{j,k}, C_{j,k}) + c(T_{\hat{\jmath}}^{\hat{\omega}}, f_{\hat{k}}) \ll \sum_{j,k,\omega} c(T_j^\omega, f_k)$

(see Eq. (2) in Sec. 3.2 of (Wang et al., 30 Apr 2025)).

2. Predictive Model Instantiations for LLM Adaptation Strategies

Cosmos-Predict2 instantiates two distinct predictor types for major adaptation paradigms:

A. Fine-Tuning (QLoRA) Embedding-Augmented Proxy

The approach uses a bidirectional encoder $g_\eta^{\text{bi}}:\mathbb{R}^{L\times d}\to\mathbb{R}^{L\times e}$ to compute a contextual embedding $e_\eta(x)\in\mathbb{R}^e$ for input $x$ .
A single-layer projector $\ell_{\phi''}:\mathbb{R}^e\to\mathcal{Y}$ is trained on a small subset of the fine-tuning data via either cross-entropy or contrastive loss with frozen encoder $g_\eta$ (batch size 8, learning rate 1e-6, 300 iterations).
Calibration is performed on a 10% validation split: $\hat{\pi}\left(T^{\text{tr}}_{\text{QLoRA}}(f_\theta)\right) = a \cdot \pi_{\phi''} + b$ for fitted scalars $a,b$ (Sec. 4.2).
The prediction cost includes training $\ell_{\phi''}$ , calibration, and inference; this is significantly lower than full QLoRA fine-tuning.

B. Retrieval-Augmented In-Context Learning (ICL) Scaling Law

Empirical performance as a function of shot number $d$ is fit with “exponential saturation”: $\hat{\pi}(T^{\text{inf}}_{\text{ICL}}(f)) = \alpha \cdot (1-\exp(-\beta d)) + \pi_0$ (Eq. (4), Sec. 4.3).
Two sparse measurements (e.g., 1-shot and 8-shot) suffice to solve for $(\alpha, \beta, \pi_0)$ , providing rapid prediction of performance for arbitrary $d$ .

3. Analytic Cost Modeling and Decision Workflow

The total cost of applying a strategy $T$ to a model $f$ is

$c(T, f) = c_{\text{adapt}}(T, f) + c_{\text{eval}}(T(f), D)$

where $c_{\text{adapt}}$ and $c_{\text{eval}}$ denote training/adaptation and evaluation phases.

Detailed cost models:

For QLoRA fine-tuning, cost expands as $c^{\text{FT}} = E \cdot \left[\text{pack}(N_\text{train}^{\text{FT}}, L_\text{max})/(B \cdot G)\right] \cdot t_\text{step} \cdot \gamma_\text{compute} \cdot N_\text{compute} \cdot \psi_\text{peak} + c_\text{eval}$ , with terms reflecting epochs $E$ , batch size $B$ , gradient accumulation $G$ , step times $t_\text{step}$ , GPU price $\gamma_\text{compute}$ , compute steps $N_\text{compute}$ , memory factors $\psi_\text{peak}$ , and token packing.
For ICL: $c^{\text{ICL}}(d, x) = c_\text{token}(E[L_\text{in}] + E[L_\text{out}])\cdot d + c_\text{token}(|x| + E[L_\text{out}]) + c_\text{eval}$ .
Prediction cost: $c_\text{predict}(P_{j,k}, C_{j,k}) = c_\text{proxy} + c_\text{overhead}(T_j^\omega, f_k) + c_\text{val}(D_\text{val})$ .

Unified strategy: For each pair $(f_k, T_j)$ and config $\omega$ , the system predicts performance $\hat{\pi}$ and cost $\hat{c}$ , computes the user-defined score $s(\hat{\pi},\hat{c})$ , and selects $(f^*_k, T^*_j, \omega^*) = \arg\max s$ . Only this optimal strategy is executed, leading to orders-of-magnitude savings over brute-force sweeps.

4. Empirical Evaluation and Benchmarks

Extensive experiments span eight benchmarks (MMLU, Winogrande, ARC-Challenge, HellaSwag, FPB, FiQA-SA, Headline, Multifin EN) with 55 QLoRA+ICL configurations across low/medium/high cost bands. Results include:

Mean Absolute Error (MAE) of predicted accuracy: 1.09%.
Average Cost Reduction Ratio (CRR) versus exhaustive search: 92.72%; up to 98.71% in high-cost regimes.
Discrepancy between predicted and actual accuracies typically within 1–2%, worst case: 0.16–4.97 points.
QLoRA and ICL performance–cost prediction curves closely match observed outcomes (see Figs. 2 & 3 in (Wang et al., 30 Apr 2025)).
On HellaSwag, compared to Random Search CV and Successive Halving CV, COSMOS matches $\geq$ 99.3% of oracle accuracy at 2.2×–27.1× lower cost (Table App.5).

5. Limitations and Future Extensions

Cosmos-Predict2 is subject to several limitations and outlines multiple research directions:

Strategy-specific predictors: Each adaptation method (e.g., QLoRA, ICL) requires a tailored predictive model. Extending the framework to prompt-tuning, LoRA/PeFT, hybrid training/test strategies, or RLHF would require development of new predictors.
Cost-model fidelity: Current cost models use average-case quantities (e.g., mean sequence length in ICL), which may introduce minor errors. A plausible implication is that dynamic or task-adaptive cost models could further improve accuracy.
Coverage: Presently restricted to QLoRA and ICL. Broader generalization would enhance utility for practitioners seeking coverage of all major adaptation techniques.
Suggested future enhancements:
- Incorporate uncertainty (e.g., Bayesian predictors) for risk-sensitive decisions.
- Enable online adaptation as more data becomes available, supporting streaming workloads.
- Integrate with dynamic multi-model cascades (query-level routing).
- Extend selection to the joint multi-task regime.

6. Context, Significance, and Relation to Prior Work

Cosmos-Predict2 is situated in the context of practical, resource-aware LLM deployment, where direct full-grid search is computationally prohibitive. By enabling direct prediction of adaptation outcomes, it transforms LLM adaptation from a laborious empirical process to an analytically-driven procedure. Its high accuracy, cost efficiency, and strategy-agnostic design mark a departure from baseline search approaches, as evidenced by substantial cost reduction and minimal loss in final accuracy.

A plausible implication is that such predictive adaptation frameworks will become necessary infrastructure in large-scale, multi-model LLM systems where compute, time, and environmental costs must be tightly regulated. The requirement for strategy-specific predictors underscores the diversity of adaptation mechanisms in current LLM practice and highlights the open challenge of comprehensive, unified prediction methods. For future LLM research, Cosmos-Predict2 offers a reference architecture for integrating analytic and learned prediction into the model selection pipeline, pointing toward risk-aware, adaptive, and scalable adaptation of foundation models (Wang et al., 30 Apr 2025).

Markdown Upgrade to Chat

References (1)

COSMOS: Predictable and Cost-Effective Adaptation of LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cosmos-Predict2.

Cosmos-Predict2: Predictive LLM Adaptation

1. Formalization of the Joint Model and Strategy Selection Problem

2. Predictive Model Instantiations for LLM Adaptation Strategies

3. Analytic Cost Modeling and Decision Workflow

4. Empirical Evaluation and Benchmarks

5. Limitations and Future Extensions

6. Context, Significance, and Relation to Prior Work

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Cosmos-Predict2: Predictive LLM Adaptation

1. Formalization of the Joint Model and Strategy Selection Problem

2. Predictive Model Instantiations for LLM Adaptation Strategies

3. Analytic Cost Modeling and Decision Workflow

4. Empirical Evaluation and Benchmarks

5. Limitations and Future Extensions

6. Context, Significance, and Relation to Prior Work

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research