Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets

Published 5 Aug 2025 in cs.CR and q-fin.TR | (2508.03474v1)

Abstract: Polymarket is a prediction market platform where users can speculate on future events by trading shares tied to specific outcomes, known as conditions. Each market is associated with a set of one or more such conditions. To ensure proper market resolution, the condition set must be exhaustive -- collectively accounting for all possible outcomes -- and mutually exclusive -- only one condition may resolve as true. Thus, the collective prices of all related outcomes should be \$1, representing a combined probability of 1 of any outcome. Despite this design, Polymarket exhibits cases where dependent assets are mispriced, allowing for purchasing (or selling) a certain outcome for less than (or more than) \$1, guaranteeing profit. This phenomenon, known as arbitrage, could enable sophisticated participants to exploit such inconsistencies. In this paper, we conduct an empirical arbitrage analysis on Polymarket data to answer three key questions: (Q1) What conditions give rise to arbitrage (Q2) Does arbitrage actually occur on Polymarket and (Q3) Has anyone exploited these opportunities. A major challenge in analyzing arbitrage between related markets lies in the scalability of comparisons across a large number of markets and conditions, with a naive analysis requiring $O(2{n+m})$ comparisons. To overcome this, we employ a heuristic-driven reduction strategy based on timeliness, topical similarity, and combinatorial relationships, further validated by expert input. Our study reveals two distinct forms of arbitrage on Polymarket: Market Rebalancing Arbitrage, which occurs within a single market or condition, and Combinatorial Arbitrage, which spans across multiple markets. We use on-chain historical order book data to analyze when these types of arbitrage opportunities have existed, and when they have been executed by users. We find a realized estimate of 40 million USD of profit extracted.

Summary

  • The paper develops a formal taxonomy of arbitrage, distinguishing market rebalancing and combinatorial strategies to reveal critical market inefficiencies.
  • It employs LLMs to detect logical dependencies between market conditions, validating the approach using extensive on-chain data over a year.
  • The analysis quantifies arbitrage profits exceeding $39M, highlighting the concentration of gains among a few high-frequency actors and persistent market inefficiency.

Arbitrage in Decentralized Prediction Markets: An Empirical Analysis of Polymarket

Introduction

This paper presents a comprehensive empirical study of arbitrage in decentralized prediction markets, focusing on Polymarket, a leading platform built on the Polygon blockchain. The authors develop a formal taxonomy of arbitrage opportunities, introduce a scalable methodology for detecting market dependencies using LLMs, and quantify both the existence and exploitation of arbitrage across a full year of Polymarket data. The analysis reveals significant inefficiencies, the prevalence of both intra- and inter-market arbitrage, and the emergence of sophisticated arbitrageur strategies, with realized profits exceeding $39 million.

Prediction Market Structure and Arbitrage Taxonomy

Polymarket operates via a hybrid-decentralized CLOB, where each market is defined by one or more binary conditions (e.g., "Will candidate X win?"). Markets are designed to be exhaustive and mutually exclusive, such that the sum of "YES" token prices should equal 1. The authors formalize two primary arbitrage types:

  • Market Rebalancing Arbitrage: Occurs within a single market when the sum of "YES" prices deviates from 1, enabling risk-free profit by taking long or short positions across all outcomes.
  • Combinatorial Arbitrage: Emerges between dependent markets when logical relationships between conditions allow for portfolio constructions that guarantee profit, even across distinct but related markets.

The formalism extends to multi-market dependencies, but the analysis is restricted to pairs due to computational and LLM context limitations.

Data Collection and Market Landscape

The dataset comprises all Polymarket markets resolved between April 2024 and April 2025, including both market metadata and on-chain bid histories. The authors employ topic classification (using Linq-Embed-Mistral embeddings) to cluster markets and reduce the combinatorial search space for dependency analysis. Figure 1

Figure 1

Figure 1: Distribution of markets and conditions by topic and end-date, highlighting the dominance of politics and sports.

Liquidity and trading volume are highly concentrated around major political events, particularly the 2024 U.S. election. Figure 2

Figure 2

Figure 2: (Top) Total liquidity per market by end date and topic. (Bottom) Executed bid volume over time, with U.S. election markets as the primary activity driver.

Detecting Market Dependencies with LLMs

A key methodological contribution is the use of LLMs (DeepSeek-R1-Distill-Qwen-32B) to infer logical dependencies between market conditions. The approach involves:

  • Reducing markets with >4 conditions to the top-4 by liquidity plus an "other" catch-all, preserving >90% of liquidity.
  • Prompting the LLM to enumerate all valid outcome vectors for single and paired markets, checking for exclusivity and dependency.
  • Filtering for pairs within the same topic and end-date to maximize semantic overlap. Figure 3

    Figure 3: Schematic of the LLM-based pipeline for detecting market dependencies.

The LLM approach is validated on single markets (where dependencies are known by construction) and then extended to pairs. Out of 46,360 market pairs in the U.S. election group, only 13 pairs satisfy the strict combinatorial arbitrage definition, reflecting both the centralized market creation process and LLM context limitations.

Empirical Detection of Arbitrage Opportunities

The authors reconstruct historical order books to compute volume-weighted average prices (VWAP) for each token at block-level granularity. Arbitrage is detected when the sum of relevant token prices deviates from theoretical constraints by at least $0.05, and only during periods of sufficient uncertainty (no token >$0.95). Figure 4

Figure 4: Time series of VWAP prices and detected arbitrage in the "Will Assad remain President of Syria through 2024?" market.

Single-Condition Arbitrage

Arbitrage within single conditions is widespread, with 7,051 out of 17,218 conditions exhibiting at least one opportunity. The median profit per dollar is approximately $0.60, indicating substantial inefficiency. <img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2508-03474/condition_arbitrage_col_num_ops_b_0.95_c_0.02_k_1_colums.png" alt="Figure 5" title="" class="markdown-image" loading="lazy"></p> <p><img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2508-03474/condition_arbitrage_col_median_dev_b_0.95_c_0.02_k_1_colums.png" alt="Figure 5" title="" class="markdown-image" loading="lazy"></p> <p><img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2508-03474/condition_arbitrage_col_max_profit_long_b_0.95_c_0.02_k_1_colums.png" alt="Figure 5" title="" class="markdown-image" loading="lazy"> <p class="figure-caption">Figure 5: Distribution of arbitrage opportunities within single conditions, with Crypto markets as notable outliers.</p> <img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2508-03474/condition_arbitrage_bar_col_max_profit_long_b_0.95_c_0.02_k_1_colums.png" alt="Figure 6" title="" class="markdown-image" loading="lazy"></p> <p><img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2508-03474/condition_arbitrage_bar_col_max_profit_long_100_b_0.95_c_0.02_k_1_colums.png" alt="Figure 6" title="" class="markdown-image" loading="lazy"> <p class="figure-caption">Figure 6: Aggregate arbitrage potential by condition, both uncapped and capped at $100 liquidity. Sports markets show more frequent but smaller opportunities; politics dominates in uncapped profit.

Multi-Condition (NegRisk) Market Arbitrage

Within NegRisk markets, 662 out of 1,578 markets exhibit arbitrage, with both long and short opportunities present. Sports markets are particularly prone to overvaluation, as evidenced by the higher frequency and magnitude of short arbitrage. Figure 7

Figure 7

Figure 7: Total arbitrage potential across markets, with Sports showing consistent profit, especially in long opportunities.

Combinatorial Arbitrage Across Dependent Markets

Among the 13 dependent market pairs identified, arbitrage opportunities are less frequent and generally lower in liquidity, but still present, especially during periods of market uncertainty. Figure 8

Figure 8

Figure 8: Distribution of profit per dollar and maximum profit for combinatorial arbitrage in U.S. election market pairs.

Realized Arbitrage and Arbitrageur Behavior

The authors match detected opportunities to user bid histories, grouping transactions within 950-block windows. Approximately 1% of U.S. election arbitrage was exploited, with Sports single markets surpassing politics in realized profit. Figure 9

Figure 9: Total profit realized by users in single-condition arbitrage, with Sports dominating exploited opportunities.

Rebalancing arbitrage in NegRisk markets is primarily realized in politics, with most opportunities yielding low returns but a few outliers in Crypto, Politics, and Twitter. Figure 10

Figure 10

Figure 10: (Left) Realized profit from rebalancing arbitrage in NegRisk markets. (Right) Yield distribution per dollar, with most opportunities exhibiting low returns.

The top 10 arbitrageurs extracted over $7 million, with the largest single user realizing$2 million in profit. The majority of profit is concentrated among a small set of highly active, likely automated, accounts. Figure 11

Figure 11: Log-log plot of number of bids vs. aggregate profit per account, highlighting the concentration of profit among a few high-frequency actors.

Market Microstructure and Liquidity Concentration

Liquidity is highly concentrated in the top-ranked conditions within each market, justifying the reduction to 4+1 conditions for LLM analysis. Figure 12

Figure 12: Cumulative liquidity distribution by condition rank, with >90% of liquidity in the top 4 conditions.

Discussion and Implications

The study demonstrates that, despite the theoretical efficiency of prediction markets, substantial arbitrage persists due to non-atomic execution, fragmented liquidity, and the complexity of cross-market dependencies. The use of LLMs for dependency detection is effective but limited by context size and prompt engineering challenges. The realized arbitrage volume, while significant in absolute terms, is modest relative to atomic DeFi markets, reflecting the higher execution risk and lower liquidity in prediction markets.

The findings have several implications:

  • Market Design: Centralized market creation and resolution processes limit the prevalence of combinatorial arbitrage, but as platforms decentralize, more complex dependencies and arbitrage opportunities are likely to emerge.
  • LLM Reasoning: Current LLMs are effective for pairwise dependency detection but struggle with larger market sets. Advances in context handling and logical reasoning will be necessary for scaling to higher-order dependencies.
  • Arbitrageur Strategies: The concentration of profit among a few high-frequency actors suggests the emergence of specialized, possibly automated, arbitrageurs analogous to those in DeFi.
  • Market Efficiency: The persistence of high-magnitude arbitrage, especially in single-condition and sports markets, indicates that prediction markets remain far from informational efficiency, particularly under non-atomic execution constraints.

Conclusion

This work provides a rigorous empirical foundation for understanding arbitrage in decentralized prediction markets. The combination of formal arbitrage taxonomy, LLM-based dependency detection, and large-scale on-chain data analysis reveals both the structural inefficiencies and the evolving strategies of arbitrageurs in Polymarket. As prediction markets grow in scale and complexity, the methodologies developed here will be essential for monitoring market efficiency, designing robust market mechanisms, and understanding the interplay between human and algorithmic actors in decentralized information aggregation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview

This paper studies a website called Polymarket, where people trade “Yes” and “No” shares about future events, like “Will Team A win?” or “Will Candidate B become president?” The price of a share acts like the crowd’s best guess of the chance that something will happen.

The authors look for mistakes in how these shares are priced. When prices don’t make sense, smart traders can lock in a risk-free profit. That’s called arbitrage. The paper explains two kinds of arbitrage that can happen on Polymarket, shows how often they occur, and estimates how much money people made from them.

What were they trying to find out?

The paper asks three simple questions:

  • Q1: What kinds of situations create arbitrage in prediction markets?
  • Q2: Do these arbitrage opportunities actually appear on Polymarket?
  • Q3: Has anyone taken advantage of them in real life?

How did they study it?

The authors combined math, computer tools, and real trading data.

Here’s the idea in everyday language:

  • Markets and conditions: A “market” is a big question, and its “conditions” are the possible answers. For example, “Who will win?” might have three conditions: Team A wins, Team B wins, or a tie. Only one of these can be true in the end.
  • Prices should add to 1: If you add up the “Yes” prices for all the conditions in a single market, they should total 1 (which means 100%). If the sum is less than 1, you can buy a bit of everything and guarantee a profit when the event resolves. If it’s more than 1, you can profit by selling or shorting in the right way.
  • Related markets: Sometimes different markets talk about the same event in different ways. For example, one market says “Team A or Team B wins,” while another says “Team A wins by 2 or more.” These are linked. If Team A wins by 2 or more, then Team A must also have won, period. The authors checked for these links between markets.
  • Scaling the search: There are thousands of markets. Comparing every pair of markets in every possible way would take forever. So the authors reduced the search by:
    • Grouping markets that end on the same date.
    • Grouping by topic (like Politics or Sports).
    • Using text similarity to find markets that talk about the same event.
  • Using AI (LLMs): They used a LLM (an AI text assistant) to read market descriptions and spot logical relationships. Think of it like a very careful reader that says, “If this statement is true, that statement must also be true,” and writes down all the possible combinations that make sense. They also did “prompt engineering,” which means they carefully designed the instructions given to the AI so it would answer in a useful, structured way.
  • On-chain data: Polymarket records matched trades on the blockchain, which is a public, tamper-resistant database. The authors downloaded these records to see when and how trades happened, and to catch moments when arbitrage would have been possible or actually happened.

What did they find?

The authors found two main types of arbitrage:

  • Market Rebalancing Arbitrage (inside one market): If the total “Yes” prices for all outcomes in a single market add up to less than 1, you can buy a little of every outcome and guarantee a profit when one of them wins. If they add up to more than 1, you can use a different strategy to profit from overpricing. This corrects the market and pushes prices back toward sensible values.
  • Combinatorial Arbitrage (across multiple related markets): If two markets are logically linked (like “Team A wins” and “Team A wins by 2+”), you can build a combo of positions across both that guarantees at least one will pay off. The profit comes from the mismatch between how the two markets are priced.

Why it matters:

  • These mispricings do happen on Polymarket.
  • People exploit them. By analyzing the blockchain records, the authors estimate that around 40 million USD of profit was made from these kinds of arbitrage during the year they studied (April 2024 to April 2025).

Why are these results important?

  • Price accuracy: Arbitrage helps fix prices. When traders spot a mistake and trade against it, they push prices back to better reflect true probabilities. That makes the market’s forecasts more reliable.
  • Real-world impact: During big events (like the 2024 U.S. elections), Polymarket had huge trading volume. If prices are accurate, the market’s odds can be a useful signal for the public, journalists, and decision-makers.
  • Technology and fairness: On blockchains, arbitrage can be fast and complex. Skilled traders may capture most of the profits. Understanding these patterns helps platforms and users think about fairness and design better systems.

What does this mean for the future?

  • Better market design: Platforms can add guardrails to reduce mispricing, improve how related markets are built, and make price consistency easier.
  • Smarter monitoring: The AI approach the authors used shows that we can automatically flag linked markets and find errors early. That could help platforms, regulators, or researchers watch for problems in real time.
  • Trade-offs: Arbitrage is often considered helpful because it corrects prices. But there’s ongoing debate about whether specialized traders capture too much value compared to regular users. Designing systems that are both efficient and fair remains a challenge.

In short, the paper shows that arbitrage in prediction markets is real, measurable, and significant. By combining smart math, AI reading of market text, and blockchain data, the authors mapped out when and how mispricings happen—and how much money they can generate.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 41 tweets with 3873 likes about this paper.