Papers
Topics
Authors
Recent
Search
2000 character limit reached

Double Dixie Cup Problem Analysis

Updated 4 July 2026
  • The Double Dixie Cup Problem is a generalization of the coupon collector problem where one collects m complete sets of N coupons, accounting for extra multiplicity requirements.
  • It utilizes advanced techniques including Poissonization, point‐process limits, and asymptotic methods to derive expectations, variances, and limit laws for both equal and unequal probability models.
  • Recent work highlights variance extremality and shows that uniform coupon probabilities minimize variance, while rare coupon dynamics dominate completion times in heterogeneous settings.

Searching arXiv for relevant papers on the Double Dixie Cup Problem and closely related coupon-collector generalizations. The Double Dixie Cup Problem is the classical extension of the coupon collector problem in which the objective is not merely to see every coupon type once, but to obtain mm complete sets of all NN coupon types. In the standard notation of the literature, Tm(N)T_m(N) denotes the number of trials needed until each of the NN coupon types has been observed at least mm times. The terminology “double Dixie cup” is often used for the case m=2m=2, while many modern treatments use the same name for the full mm-set problem (Doumas et al., 2014). The topic occupies a central position in probabilistic combinatorics, asymptotic analysis, and Poissonization-based methods, and has recently been developed at several levels: classical equal-probability asymptotics, unequal-probability generalizations, point-process limits, stopped occupancy formulations, and finite-NN extremality results for the variance (Ilienko, 2019, Doumas et al., 2014, Long, 28 Apr 2026).

1. Classical definition and basic probabilistic structure

In the standard coupon collector setup, there are NN coupon types, and on each trial coupon type j{1,,N}j\in\{1,\dots,N\} is drawn independently with probability NN0, with NN1. The ordinary coupon collector time is NN2, the number of trials needed to see every type at least once. The Double Dixie Cup Problem generalizes this to

NN3

so that NN4 is the time to see every coupon twice, and more generally NN5 is the time to complete NN6 full sets (Doumas et al., 2014).

A closely related formulation appears in the equal-probability setting with NN7 coupon types, where one tracks, for each type NN8, the time NN9 when the Tm(N)T_m(N)0-th coupon of type Tm(N)T_m(N)1 arrives. For fixed Tm(N)T_m(N)2,

Tm(N)T_m(N)3

These variables are identically distributed but not independent, because all coupon types are observed in the same stream of arrivals (Ilienko, 2019).

The equal-probability case corresponds to Tm(N)T_m(N)4. In that regime, the classical asymptotic scale for completion of Tm(N)T_m(N)5 full sets is

Tm(N)T_m(N)6

which reduces to Tm(N)T_m(N)7 for the ordinary collector problem Tm(N)T_m(N)8 (Doumas et al., 2014). In the notation of (Ilienko, 2019), the analogous centering function for the Tm(N)T_m(N)9-th arrival level is

NN0

This shared centering already indicates the structural relation between the coupon collector problem and the Dixie cup problem: the latter introduces the extra correction NN1, reflecting the higher arrival multiplicity threshold (Ilienko, 2019).

2. Classical equal-probability asymptotics

For fixed NN2, Newman and Shepp proved that

NN3

with a constant NN4. Erdős and Rényi later identified

NN5

where NN6 is the Euler–Mascheroni constant, and established the limit law

NN7

Equivalently, after shifting by NN8, the limit is the standard Gumbel law (Doumas et al., 2014).

The same limit theorem is recovered in the point-process treatment of NN9-th arrivals. If mm0 denotes the equal-probability completion time, then

mm1

where mm2; for mm3 this becomes

mm4

the familiar Gumbel-type limit (Ilienko, 2019).

Recent work has also clarified the variance asymptotics in the equal-probability case. For fixed mm5, (Long, 28 Apr 2026) proves

mm6

and

mm7

This recovers the classical mm8 case and proves the fixed-mm9 variance asymptotic for every m=2m=20, which Doumas and Papanicolaou had stated as a conjecture (Long, 28 Apr 2026).

A further extension concerns growing multiplicity m=2m=21. Defining m=2m=22 by

m=2m=23

with

m=2m=24

the paper proves the equal-probability Gumbel law

m=2m=25

where m=2m=26 is standard Gumbel, together with

m=2m=27

(Long, 28 Apr 2026).

3. Poissonization, Erlang representations, and moment formulas

A major tool throughout the subject is Poissonization, which replaces the dependent discrete-time coupon stream by independent continuous-time Poisson processes. In the equal-probability framework of (Ilienko, 2019), coupons arrive at times of a unit-rate Poisson process with i.i.d. uniform marks in m=2m=28. For each coupon type m=2m=29, the arrivals of that type form an independent rate-mm0 Poisson process, and the mm1-th arrival time mm2 satisfies

mm3

The discrete and poissonized times are coupled by

mm4

In the unequal-probability setting, (Doumas et al., 2014) introduces independent Poisson processes mm5 with rates mm6, and lets mm7 be the time of the mm8-th event in process mm9. Then NN0 is Erlang with survival function

NN1

Since the NN2’s are independent,

NN3

where NN4 (Doumas et al., 2014).

This Poissonized structure yields explicit integral formulas for moments. The NN5-th rising moment is

NN6

and in particular

NN7

NN8

Hence

NN9

(Doumas et al., 2014).

The same Poissonized representation underlies the finite-NN0 theory in (Long, 28 Apr 2026). There, coupon NN1 is assigned an independent Poisson process of rate NN2, and the continuous completion time NN3 is the time at which every process has reached level NN4. Since the waiting time to the NN5-th arrival is Gamma/Erlang,

NN6

is the survival function of the ErlangNN7 time, and

NN8

The paper also uses the exact transfer identity

NN9

so that the Poissonized model controls the discrete completion time through rising moments (Long, 28 Apr 2026).

4. Point-process and functional-limit formulations

A major conceptual shift occurs in the point-process approach of (Ilienko, 2019). Instead of analyzing only the scalar completion time j{1,,N}j\in\{1,\dots,N\}0, the paper studies the full family of centered and normalized j{1,,N}j\in\{1,\dots,N\}1-th arrival times,

j{1,,N}j\in\{1,\dots,N\}2

Its poissonized analogue is

j{1,,N}j\in\{1,\dots,N\}3

The main theorem states that if j{1,,N}j\in\{1,\dots,N\}4 is the Poisson point process on j{1,,N}j\in\{1,\dots,N\}5 with intensity measure

j{1,,N}j\in\{1,\dots,N\}6

then

j{1,,N}j\in\{1,\dots,N\}7

Thus the centered and normalized j{1,,N}j\in\{1,\dots,N\}8-th arrival times across coupon types converge to a non-homogeneous Poisson point process with exponential intensity j{1,,N}j\in\{1,\dots,N\}9 (Ilienko, 2019).

The limit process admits a useful representation. If NN00 is a unit-rate Poisson point process on NN01, and

NN02

then

NN03

This identifies NN04 as the image of a homogeneous Poisson process under a logarithmic transformation (Ilienko, 2019).

The point-process limit is stronger than the classical one-dimensional limit and yields infinite-dimensional extensions. Let NN05 be the first time when some NN06 coupon types have already appeared at least NN07 times each, and define

NN08

Then

NN09

and Theorem 4.1 gives

NN10

This is an infinite-dimensional extension of classical limit theorems for the Dixie cup problem (Ilienko, 2019).

The same framework yields a functional limit for rare coupon types. A type NN11 is called NN12-rare if

NN13

Let

NN14

Then NN15 converges in NN16 with the NN17-topology to

NN18

where NN19 is a standard unit-rate Poisson process (Ilienko, 2019).

This process-level formulation suggests that the Double Dixie Cup Problem is naturally interpreted not only as a first-passage problem for a maximum, but also as an extremal point-process problem for the entire cloud of multiplicity-threshold arrival times.

5. Unequal probabilities and heterogeneous coupon populations

A substantial generalization replaces equal sampling probabilities by a positive sequence

NN20

and defines

NN21

The Double Dixie Cup Problem then becomes the analysis of NN22 under arbitrary positive coupon probability vectors generated by NN23 (Doumas et al., 2014).

A key dichotomy in (Doumas et al., 2014) is whether there exists NN24 such that

NN25

This leads to two regimes.

Regime Condition Limiting behavior
Case I NN26 Nonuniversal limit depending on NN27
Case II NN28 with NN29 Gumbel regime after adapted normalization

In Case I, the paper defines

NN30

and

NN31

Then

NN32

If NN33, the asymptotics are

NN34

NN35

and

NN36

The corresponding limit law is

NN37

where

NN38

This limit is not universal and is generally not Gumbel (Doumas et al., 2014).

In Case II, one writes

NN39

with NN40 positive, increasing, smooth, and satisfying

NN41

together with additional regularity assumptions. Defining

NN42

the paper proves

NN43

and

NN44

Most notably, in Case II the leading variance term is independent of NN45 (Doumas et al., 2014).

The limit law in this regime is Gumbel after adapted centering and scaling. With

NN46

where

NN47

one has

NN48

This extends the Erdős–Rényi limit from equal probabilities to broad classes of unequal probabilities (Doumas et al., 2014).

Examples explicitly treated include generalized Zipf laws NN49, exponential weights NN50, and slow logarithmic decay NN51 (Doumas et al., 2014).

6. Interlacing mixtures, stopped occupancy, and maximal counts

The heterogeneous setting is developed further in (Doumas, 29 Oct 2025), which studies an interlacing mixture of two coupon distributions. There, one considers

NN52

with NN53, so the coupon population consists of two subfamilies, each of size NN54: a common family NN55 and a rare family NN56. Their masses are

NN57

The paper assumes NN58 and

NN59

with NN60 positive, increasing, NN61, and satisfying the stated derivative conditions (Doumas, 29 Oct 2025).

The central random variable remains

NN62

the number of trials needed until each of the NN63 coupon types has been observed at least NN64 times. The standard integral representation is

NN65

Using Poissonization, the process is decomposed into rare-coupon stages. If NN66 is the number of common-family arrivals between successive rare-family arrivals, then

NN67

and

NN68

By Wald’s lemma,

NN69

Hence

NN70

This identifies a product structure: a mass factor NN71 depending on both subfamilies, and a hardness factor determined only by the rare subfamily (Doumas, 29 Oct 2025).

For the rare family,

NN72

so that

NN73

up to the specific asymptotics of NN74. The paper emphasizes that the parameter NN75 does not appear in the leading term as NN76 (Doumas, 29 Oct 2025). This suggests that, in this heterogeneous regime, the rarest coupons dominate the asymptotic difficulty regardless of the requested multiplicity.

A different but related reformulation appears in the stopped occupancy model of (Gnedin et al., 25 Jun 2025). Balls are thrown independently into NN77 boxes, each with probability NN78, and the process stops when only NN79 boxes remain that have at most NN80 balls: NN81 Equivalently,

NN82

This includes the coupon collector problem as NN83, and the Dixie cup problem as NN84 (Gnedin et al., 25 Jun 2025).

The paper studies not primarily the stopping time but the maximum occupancy at that time,

NN85

With

NN86

one has

NN87

where NN88 has a Gumbel law of order NN89. For the maximum, however, there is no single limit distribution. Defining

NN90

with NN91, NN92, the paper proves

NN93

where NN94 is standard Gumbel and NN95 is independent (Gnedin et al., 25 Jun 2025).

For the Double Dixie Cup setting, this means the maximal occupancy at completion is asymptotically a rounded sum of two independent Gumbels, with oscillations close to periodic on a logarithmic scale. The nonconvergence of NN96 as NN97 is attributed to the fractional centering term NN98, so that convergence occurs only along subsequences with NN99 (Gnedin et al., 25 Jun 2025).

7. Variance extremality, terminal defects, and broader uses

A recent finite-Tm(N)T_m(N)00 development is the variance extremality theory of (Long, 28 Apr 2026). For every Tm(N)T_m(N)01 and Tm(N)T_m(N)02, among all positive coupon probability vectors

Tm(N)T_m(N)03

the variance of the time Tm(N)T_m(N)04 to collect Tm(N)T_m(N)05 complete sets is uniquely minimized at the uniform vector

Tm(N)T_m(N)06

More precisely,

Tm(N)T_m(N)07

with equality iff Tm(N)T_m(N)08 (Long, 28 Apr 2026).

The paper proves the stronger radial monotonicity statement: if

Tm(N)T_m(N)09

then Tm(N)T_m(N)10 is strictly increasing for Tm(N)T_m(N)11. The proof is based on a terminal-defect viewpoint. At time Tm(N)T_m(N)12, coupon Tm(N)T_m(N)13 is still defective if it has been seen fewer than Tm(N)T_m(N)14 times, with defect probability Tm(N)T_m(N)15, and the expected number of terminal defects is

Tm(N)T_m(N)16

Completion occurs exactly when this count is zero (Long, 28 Apr 2026).

The analytic core of the argument is a monotone-likelihood-ratio comparison derived from a log-scale monotonicity property of the Gamma reverse hazard. If

Tm(N)T_m(N)17

then

Tm(N)T_m(N)18

is strictly negative and strictly decreasing on Tm(N)T_m(N)19. Consequently Tm(N)T_m(N)20 is strictly decreasing, and for every Tm(N)T_m(N)21, the ratio

Tm(N)T_m(N)22

is strictly decreasing (Long, 28 Apr 2026). This is the one-site input supporting the global variance comparison.

The Double Dixie Cup Problem also appears outside its original probabilistic context. In query-based Tm(N)T_m(N)23-means clustering with same-cluster queries, (Chien et al., 2018) uses it as the combinatorial model for the number of random samples required until every cluster has enough representatives to estimate its centroid. The paper explicitly states: “The double Dixie cup problem is an extension of the classical coupon collector problem in which the collector is required to collect Tm(N)T_m(N)24 sets of coupons.” In that setting, coupon types correspond to clusters, and obtaining Tm(N)T_m(N)25 samples from every cluster is the analogue of completing Tm(N)T_m(N)26 sets (Chien et al., 2018).

Let Tm(N)T_m(N)27 be the number of sampling rounds needed until each of the Tm(N)T_m(N)28 cluster-types has been seen at least Tm(N)T_m(N)29 times. The paper states

Tm(N)T_m(N)30

where

Tm(N)T_m(N)31

Under an Tm(N)T_m(N)32-imbalance assumption, it derives the bound

Tm(N)T_m(N)33

which feeds directly into the query complexity

Tm(N)T_m(N)34

for the noiseless clustering algorithm (Chien et al., 2018). This application shows that the Double Dixie Cup Problem functions as a reusable probabilistic template whenever “coverage with multiplicity” is the governing bottleneck.

Taken together, these developments show that the Double Dixie Cup Problem is no longer confined to the classical question of expectation asymptotics for equal probabilities. It now comprises a family of models and techniques: exact Poissonized representations, non-homogeneous Poisson point-process limits, asymptotics for unequal and interlaced probability profiles, stopped occupancy extremal statistics, and finite-Tm(N)T_m(N)35 extremality principles (Doumas et al., 2014, Ilienko, 2019, Doumas, 29 Oct 2025, Gnedin et al., 25 Jun 2025, Long, 28 Apr 2026). A plausible implication is that the modern theory views completion not merely as a hitting time, but as an extreme-value and defect-elimination phenomenon whose asymptotic form depends sensitively on the rarity structure of the coupon population.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Double Dixie Cup Problem.