MixMCP: Forecasting with Markets and LLMs

Updated 3 July 2026

MixMCP is a forecasting method that combines market-derived Bayesian priors with LLM-driven textual updates to yield calibrated predictions.
The approach employs a convex blend with a fixed weight (α=0.7) to balance market anchoring and evidence responsiveness.
Empirical evaluations show MixMCP marginally improves Brier scores and accuracy over baseline market methods in mention-market contexts.

MixMCP is a forecasting approach for mention markets—prediction markets that resolve based on whether a specified keyword is mentioned during a future public event—combining market-derived probabilities with LLM updates to enhance probabilistic predictions. MixMCP operates by anchoring forecasts to the market-implied prior while adapting predictions in response to textual evidence, producing systematically more robust and calibrated outcomes than either component alone (Kim et al., 4 Feb 2026).

1. Formal Schematic and Bayesian Foundation

MixMCP is fundamentally built on Market-Conditioned Prompting (MCP). In MCP, the market's YES-contract price $p_\mathrm{mkt} \in [0,1]$ at a fixed cutoff time (seven days prior to the event) serves as a Bayesian prior $P_\mathrm{market}$ . The LLM is presented with contextual evidence—including the most recent earnings-call transcript $T$ , and a set of recent news articles $N$ —and tasked to revise the prior, producing an updated posterior $P_\mathrm{MCP}$ :

$P_\mathrm{market} \equiv p_\mathrm{mkt}$

$P_\mathrm{MCP} = \mathrm{LLMe}(c \mid P_\mathrm{market})$

where $c$ denotes the textual context and $\mathrm{LLMe}$ denotes invocation of the frozen LLM instructed via the MCP prompt template to update, not overwrite, the prior.

MixMCP synthesizes forecasts via a convex combination of the market prior and the MCP posterior:

$P_\mathrm{mix} = \alpha \cdot P_\mathrm{market} + (1-\alpha) \cdot P_\mathrm{MCP}$

Here $P_\mathrm{market}$ 0 is tuned to trade off market anchoring and LLM-updated adaptability.

2. End-to-End Implementation and Prompting Workflow

Each mention-market instance $P_\mathrm{market}$ 1 comprises the following inputs:

$P_\mathrm{market}$ 2: market-implied prior (contract price)
$P_\mathrm{market}$ 3: previous quarter's earnings transcript
$P_\mathrm{market}$ 4: relevant pre-call news articles
$P_\mathrm{market}$ 5: mixture weight (empirically fixed at 0.7)
LLMe: frozen LLM with the MCP prompt template

The workflow is:

Prompt Construction:

The MCP prompt explicitly states the market price and integrates all textual evidence: $T$ 9 Instructional preamble: $N$ 0

LLM Query: The LLM is queried with the completed prompt; the forecast is extracted as $P_\mathrm{market}$ 6.
Mixing: $P_\mathrm{market}$ 7 is computed as $P_\mathrm{market}$ 8.
Iteration: This process is repeated for all market instances $P_\mathrm{market}$ 9.

No new prompt construction or LLM query is required for MixMCP itself; the mixture is computed post-hoc.

3. Empirical Performance and Comparative Results

Forecasting performance, as benchmarked on a dataset of $T$ 0 mention-market instances, is evaluated using Brier score, expected calibration error (ECE), accuracy, and F1:

Method	Brier	ECE	Accuracy (%)	F1
Market baseline	0.1402	0.0651	79.8	0.840
MCP alone	0.1470	0.0514	78.2	0.822
MixMCP (α=0.7)	0.1392	0.0666	80.3	0.842

MixMCP yields a small but statistically significant Brier score improvement over the market baseline (0.1392 vs 0.1402; $T$ 1 via paired-sample $T$ 2-test), and a 0.5% accuracy increase. MCP alone improves calibration (ECE reduction to 0.0514) but underperforms on raw accuracy and Brier relative to the market. Explicit Bayesian prompting is empirically superior to naive text inclusion.

In the mid-confidence regime (market probabilities 50–70%), MCP corrections result in more valid reclassifications than the market alone. Empirical variance in forecasts is also reduced under MixMCP.

4. Tuning and Theoretical Rationale for the Mixing Parameter

Parameter $T$ 3 mediates the influence of the market prior. A practical grid search over $T$ 4 found the Brier score to be minimized and stable in the range $T$ 5. For all downstream experiments, $T$ 6 is fixed.

The rationale for a nonzero $T$ 7 is empirical and theoretical dampening: LLMs exhibit sensitivity to noisy or ambiguous textual features and can “over-react” in the presence of weak evidence, especially with voluminous news inputs. Anchoring the posterior to the market—which aggregates diverse, independent signals—reduces the risk of catastrophic LLM errors and preserves overall calibration except in select mid-confidence intervals.

This approach does not require formal model retraining or ensembling, as MixMCP operates entirely through post-hoc convex combination, maintaining interpretability and ease of implementation.

5. Prompt Engineering and Instructional Templates

Two instructional templates are central:

Shared User Input Template:

$N$ 1

MCP-Specific Preamble:

$N$ 2

No additional prompt is used specifically for MixMCP; mixing is operationalized at the numeric post-processing stage.

6. Advantages, Limitations, and Interpretation

MixMCP’s chief advantage is robustness. By fusing the calibrated market prior with a text-informed LLM posterior, it mitigates the risk of LLM “overreaction” to spurious or weak features—a documented concern in high-noise textual domains. MixMCP consistently outperforms the market on Brier score and raw accuracy. MCP alone is best calibrated, but benefits from market anchoring in practical deployment.

A limitation is that the accuracy improvement, though consistent, is marginal; Brier and ECE gains are small though statistically significant. The approach is tailored for settings—like mention markets—where prior and textual signal have complementary strengths. A plausible implication is that in domains with more volatile prior quality or less informative text, the optimal $T$ 8 may require recalibration.

7. Replication and Application Scope

The procedural simplicity of MixMCP lends itself to replication: given a reliable market probability, relevant textual context, and a frozen LLM, practitioners can apply MCP prompting and mix as described to produce robust mention-market forecasts. All empirical and prompt details are presented in (Kim et al., 4 Feb 2026), enabling extension to alternative LLMs, additional input modalities, or varied market event types. The central principle—anchoring LLM predictions to calibrated priors—offers a blueprint for high-stakes probabilistic forecasting in multi-source contexts.

Markdown Report Issue Upgrade to Chat

References (1)

Forecasting Future Language: Context Design for Mention Markets (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MixMCP.