Papers
Topics
Authors
Recent
Search
2000 character limit reached

MixMCP: Forecasting with Markets and LLMs

Updated 3 July 2026
  • MixMCP is a forecasting method that combines market-derived Bayesian priors with LLM-driven textual updates to yield calibrated predictions.
  • The approach employs a convex blend with a fixed weight (α=0.7) to balance market anchoring and evidence responsiveness.
  • Empirical evaluations show MixMCP marginally improves Brier scores and accuracy over baseline market methods in mention-market contexts.

MixMCP is a forecasting approach for mention markets—prediction markets that resolve based on whether a specified keyword is mentioned during a future public event—combining market-derived probabilities with LLM updates to enhance probabilistic predictions. MixMCP operates by anchoring forecasts to the market-implied prior while adapting predictions in response to textual evidence, producing systematically more robust and calibrated outcomes than either component alone (Kim et al., 4 Feb 2026).

1. Formal Schematic and Bayesian Foundation

MixMCP is fundamentally built on Market-Conditioned Prompting (MCP). In MCP, the market's YES-contract price pmkt[0,1]p_\mathrm{mkt} \in [0,1] at a fixed cutoff time (seven days prior to the event) serves as a Bayesian prior PmarketP_\mathrm{market}. The LLM is presented with contextual evidence—including the most recent earnings-call transcript TT, and a set of recent news articles NN—and tasked to revise the prior, producing an updated posterior PMCPP_\mathrm{MCP}:

PmarketpmktP_\mathrm{market} \equiv p_\mathrm{mkt}

PMCP=LLMe(cPmarket)P_\mathrm{MCP} = \mathrm{LLMe}(c \mid P_\mathrm{market})

where cc denotes the textual context and LLMe\mathrm{LLMe} denotes invocation of the frozen LLM instructed via the MCP prompt template to update, not overwrite, the prior.

MixMCP synthesizes forecasts via a convex combination of the market prior and the MCP posterior:

Pmix=αPmarket+(1α)PMCPP_\mathrm{mix} = \alpha \cdot P_\mathrm{market} + (1-\alpha) \cdot P_\mathrm{MCP}

Here PmarketP_\mathrm{market}0 is tuned to trade off market anchoring and LLM-updated adaptability.

2. End-to-End Implementation and Prompting Workflow

Each mention-market instance PmarketP_\mathrm{market}1 comprises the following inputs:

  • PmarketP_\mathrm{market}2: market-implied prior (contract price)
  • PmarketP_\mathrm{market}3: previous quarter's earnings transcript
  • PmarketP_\mathrm{market}4: relevant pre-call news articles
  • PmarketP_\mathrm{market}5: mixture weight (empirically fixed at 0.7)
  • LLMe: frozen LLM with the MCP prompt template

The workflow is:

  1. Prompt Construction:

The MCP prompt explicitly states the market price and integrates all textual evidence: TT9 Instructional preamble: NN0

  1. LLM Query: The LLM is queried with the completed prompt; the forecast is extracted as PmarketP_\mathrm{market}6.
  2. Mixing: PmarketP_\mathrm{market}7 is computed as PmarketP_\mathrm{market}8.
  3. Iteration: This process is repeated for all market instances PmarketP_\mathrm{market}9.

No new prompt construction or LLM query is required for MixMCP itself; the mixture is computed post-hoc.

3. Empirical Performance and Comparative Results

Forecasting performance, as benchmarked on a dataset of TT0 mention-market instances, is evaluated using Brier score, expected calibration error (ECE), accuracy, and F1:

Method Brier ECE Accuracy (%) F1
Market baseline 0.1402 0.0651 79.8 0.840
MCP alone 0.1470 0.0514 78.2 0.822
MixMCP (α=0.7) 0.1392 0.0666 80.3 0.842

MixMCP yields a small but statistically significant Brier score improvement over the market baseline (0.1392 vs 0.1402; TT1 via paired-sample TT2-test), and a 0.5% accuracy increase. MCP alone improves calibration (ECE reduction to 0.0514) but underperforms on raw accuracy and Brier relative to the market. Explicit Bayesian prompting is empirically superior to naive text inclusion.

In the mid-confidence regime (market probabilities 50–70%), MCP corrections result in more valid reclassifications than the market alone. Empirical variance in forecasts is also reduced under MixMCP.

4. Tuning and Theoretical Rationale for the Mixing Parameter

Parameter TT3 mediates the influence of the market prior. A practical grid search over TT4 found the Brier score to be minimized and stable in the range TT5. For all downstream experiments, TT6 is fixed.

The rationale for a nonzero TT7 is empirical and theoretical dampening: LLMs exhibit sensitivity to noisy or ambiguous textual features and can “over-react” in the presence of weak evidence, especially with voluminous news inputs. Anchoring the posterior to the market—which aggregates diverse, independent signals—reduces the risk of catastrophic LLM errors and preserves overall calibration except in select mid-confidence intervals.

This approach does not require formal model retraining or ensembling, as MixMCP operates entirely through post-hoc convex combination, maintaining interpretability and ease of implementation.

5. Prompt Engineering and Instructional Templates

Two instructional templates are central:

  • Shared User Input Template:

NN1

  • MCP-Specific Preamble:

NN2

No additional prompt is used specifically for MixMCP; mixing is operationalized at the numeric post-processing stage.

6. Advantages, Limitations, and Interpretation

MixMCP’s chief advantage is robustness. By fusing the calibrated market prior with a text-informed LLM posterior, it mitigates the risk of LLM “overreaction” to spurious or weak features—a documented concern in high-noise textual domains. MixMCP consistently outperforms the market on Brier score and raw accuracy. MCP alone is best calibrated, but benefits from market anchoring in practical deployment.

A limitation is that the accuracy improvement, though consistent, is marginal; Brier and ECE gains are small though statistically significant. The approach is tailored for settings—like mention markets—where prior and textual signal have complementary strengths. A plausible implication is that in domains with more volatile prior quality or less informative text, the optimal TT8 may require recalibration.

7. Replication and Application Scope

The procedural simplicity of MixMCP lends itself to replication: given a reliable market probability, relevant textual context, and a frozen LLM, practitioners can apply MCP prompting and mix as described to produce robust mention-market forecasts. All empirical and prompt details are presented in (Kim et al., 4 Feb 2026), enabling extension to alternative LLMs, additional input modalities, or varied market event types. The central principle—anchoring LLM predictions to calibrated priors—offers a blueprint for high-stakes probabilistic forecasting in multi-source contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MixMCP.