The generalized Alice HH vs Bob HT problem (2503.19035v1)

Published 24 Mar 2025 in math.PR and math.CO

Abstract: In 2024, Daniel Litt posed a simple coinflip game pitting Alice's "Heads-Heads" vs Bob's "Heads-Tails": who is more likely to win if they score 1 point per occurrence of their substring in a sequence of n fair coinflips? This attracted over 1 million views on X and quickly spawned several articles explaining the counterintuitive solution. We study the generalized game, where the set of coin outcomes, {Heads, Tails}, is generalized to an arbitrary finite alphabet A, and where Alice's and Bob's substrings are any finite A-strings of the same length. We find that the winner of Litt's game can be determined by a single quantity which measures the amount of prefix/suffix self-overlaps in each string; whoever's string has more overlaps loses. For example, "Heads-Tails" beats "Heads-Heads" in the original problem because "Heads-Heads" has a prefix/suffix overlap of length 1 while "Heads-Tails" has none. The method of proof is to develop a precise Edgeworth expansion for discreteMarkov chains, and apply this to calculate Alice's and Bob's probability to win the game correct to order O(1/n).

Summary

The Generalized Alice HH vs Bob HT Problem

The paper "The Generalized Alice HH vs Bob HT Problem" by Svante Janson, Mihai Nica, and Simon Segert provides an in-depth analysis of a probabilistic problem originally posed by Daniel Litt. This problem involves a competition between two players, Alice and Bob, who score based on the occurrence of specific substrings within a sequence of coin flips. Alice scores for "Heads-Heads" (HH) and Bob for "Heads-Tails" (HT), with the mathematical curiosity arising from determining who is more likely to win based on statistical reasoning.

The authors generalize Litt's problem, expanding the binary Heads-Tails (H-T) scenario to sequences defined over arbitrary finite alphabets A, and considering substrings of identical length for Alice and Bob. The central insight is that the advantage in this generalized game is decided by a single quantity that measures prefix/suffix overlaps within each string. Specifically, the string with more overlaps is disadvantaged.

The methodical advancement of this analysis is based on deriving precise Edgeworth expansions for discrete Markov chains and using these to compute winning probabilities up to order $O(1/n)$ . The authors provide exact formulas for determining the asymptotic advantage in terms of overlap indices $O_{UV}$ between two strings U and V. They demonstrate that larger self-overlap corresponds with a statistical disadvantage.

Key Results and Theoretical Implications

Main Theoretical Insight: The central theorem indicates that the asymptotic winner is determined by comparing the overlap measure of Alice's string to that of Bob's string. A larger overlap measure leads to a statistically disadvantaged position for that string configuration.
Mathematical Rigor: The paper extends previous analyses by establishing through mathematical proofs that the winning probabilities are contingent upon the overlap characteristics between competing strings. The authors derive these results using advanced probabilistic techniques, notably Edgeworth expansion for Markov chains associated with character sequences.
Numerical Analysis: Detailed calculations assert that in the original Litt problem configuration, Bob is more likely to win by a margin established by overlap indices. Numerical estimations confirm that Bob consistently outperforms Alice in the typical "HH vs HT" setup from Litt’s original post.

Practical Implications and Future Directions

The findings have implications in broader topics of sequence analysis and pattern recognition within random sequences, which are applicable in areas such as genetic signal detection, machine learning, and information theory. By understanding how self-overlapping patterns diminish predictability, these results become relevant in optimizing algorithms that deal with sequential data and stochastic processes.

Given the intricate nature of the Edgeworth expansion and the application to Markov models, future research could explore similar expansions for more complex dependencies beyond homogenous Markov chains and explore computational efficiency improvements for handling large alphabets or more expansive overlap configurations.

Additionally, the construction of generalized models where Alice and Bob score on multiple patterns could introduce new dimensions to competitive fairness analysis, potentially informing strategies in fields requiring competitive prediction models.

In conclusion, the paper advances theoretical understanding of statistical pattern formation within random sequences and provides robust analytical techniques relevant across various domains involving sequences and stochastic modeling.