MixMCP: Forecasting with Markets and LLMs
- MixMCP is a forecasting method that combines market-derived Bayesian priors with LLM-driven textual updates to yield calibrated predictions.
- The approach employs a convex blend with a fixed weight (α=0.7) to balance market anchoring and evidence responsiveness.
- Empirical evaluations show MixMCP marginally improves Brier scores and accuracy over baseline market methods in mention-market contexts.
MixMCP is a forecasting approach for mention markets—prediction markets that resolve based on whether a specified keyword is mentioned during a future public event—combining market-derived probabilities with LLM updates to enhance probabilistic predictions. MixMCP operates by anchoring forecasts to the market-implied prior while adapting predictions in response to textual evidence, producing systematically more robust and calibrated outcomes than either component alone (Kim et al., 4 Feb 2026).
1. Formal Schematic and Bayesian Foundation
MixMCP is fundamentally built on Market-Conditioned Prompting (MCP). In MCP, the market's YES-contract price at a fixed cutoff time (seven days prior to the event) serves as a Bayesian prior . The LLM is presented with contextual evidence—including the most recent earnings-call transcript , and a set of recent news articles —and tasked to revise the prior, producing an updated posterior :
where denotes the textual context and denotes invocation of the frozen LLM instructed via the MCP prompt template to update, not overwrite, the prior.
MixMCP synthesizes forecasts via a convex combination of the market prior and the MCP posterior:
Here 0 is tuned to trade off market anchoring and LLM-updated adaptability.
2. End-to-End Implementation and Prompting Workflow
Each mention-market instance 1 comprises the following inputs:
- 2: market-implied prior (contract price)
- 3: previous quarter's earnings transcript
- 4: relevant pre-call news articles
- 5: mixture weight (empirically fixed at 0.7)
- LLMe: frozen LLM with the MCP prompt template
The workflow is:
- Prompt Construction:
The MCP prompt explicitly states the market price and integrates all textual evidence: 9 Instructional preamble: 0
- LLM Query: The LLM is queried with the completed prompt; the forecast is extracted as 6.
- Mixing: 7 is computed as 8.
- Iteration: This process is repeated for all market instances 9.
No new prompt construction or LLM query is required for MixMCP itself; the mixture is computed post-hoc.
3. Empirical Performance and Comparative Results
Forecasting performance, as benchmarked on a dataset of 0 mention-market instances, is evaluated using Brier score, expected calibration error (ECE), accuracy, and F1:
| Method | Brier | ECE | Accuracy (%) | F1 |
|---|---|---|---|---|
| Market baseline | 0.1402 | 0.0651 | 79.8 | 0.840 |
| MCP alone | 0.1470 | 0.0514 | 78.2 | 0.822 |
| MixMCP (α=0.7) | 0.1392 | 0.0666 | 80.3 | 0.842 |
MixMCP yields a small but statistically significant Brier score improvement over the market baseline (0.1392 vs 0.1402; 1 via paired-sample 2-test), and a 0.5% accuracy increase. MCP alone improves calibration (ECE reduction to 0.0514) but underperforms on raw accuracy and Brier relative to the market. Explicit Bayesian prompting is empirically superior to naive text inclusion.
In the mid-confidence regime (market probabilities 50–70%), MCP corrections result in more valid reclassifications than the market alone. Empirical variance in forecasts is also reduced under MixMCP.
4. Tuning and Theoretical Rationale for the Mixing Parameter
Parameter 3 mediates the influence of the market prior. A practical grid search over 4 found the Brier score to be minimized and stable in the range 5. For all downstream experiments, 6 is fixed.
The rationale for a nonzero 7 is empirical and theoretical dampening: LLMs exhibit sensitivity to noisy or ambiguous textual features and can “over-react” in the presence of weak evidence, especially with voluminous news inputs. Anchoring the posterior to the market—which aggregates diverse, independent signals—reduces the risk of catastrophic LLM errors and preserves overall calibration except in select mid-confidence intervals.
This approach does not require formal model retraining or ensembling, as MixMCP operates entirely through post-hoc convex combination, maintaining interpretability and ease of implementation.
5. Prompt Engineering and Instructional Templates
Two instructional templates are central:
- Shared User Input Template:
1
- MCP-Specific Preamble:
2
No additional prompt is used specifically for MixMCP; mixing is operationalized at the numeric post-processing stage.
6. Advantages, Limitations, and Interpretation
MixMCP’s chief advantage is robustness. By fusing the calibrated market prior with a text-informed LLM posterior, it mitigates the risk of LLM “overreaction” to spurious or weak features—a documented concern in high-noise textual domains. MixMCP consistently outperforms the market on Brier score and raw accuracy. MCP alone is best calibrated, but benefits from market anchoring in practical deployment.
A limitation is that the accuracy improvement, though consistent, is marginal; Brier and ECE gains are small though statistically significant. The approach is tailored for settings—like mention markets—where prior and textual signal have complementary strengths. A plausible implication is that in domains with more volatile prior quality or less informative text, the optimal 8 may require recalibration.
7. Replication and Application Scope
The procedural simplicity of MixMCP lends itself to replication: given a reliable market probability, relevant textual context, and a frozen LLM, practitioners can apply MCP prompting and mix as described to produce robust mention-market forecasts. All empirical and prompt details are presented in (Kim et al., 4 Feb 2026), enabling extension to alternative LLMs, additional input modalities, or varied market event types. The central principle—anchoring LLM predictions to calibrated priors—offers a blueprint for high-stakes probabilistic forecasting in multi-source contexts.