Double Dixie Cup Problem Analysis
- The Double Dixie Cup Problem is a generalization of the coupon collector problem where one collects m complete sets of N coupons, accounting for extra multiplicity requirements.
- It utilizes advanced techniques including Poissonization, point‐process limits, and asymptotic methods to derive expectations, variances, and limit laws for both equal and unequal probability models.
- Recent work highlights variance extremality and shows that uniform coupon probabilities minimize variance, while rare coupon dynamics dominate completion times in heterogeneous settings.
Searching arXiv for relevant papers on the Double Dixie Cup Problem and closely related coupon-collector generalizations. The Double Dixie Cup Problem is the classical extension of the coupon collector problem in which the objective is not merely to see every coupon type once, but to obtain complete sets of all coupon types. In the standard notation of the literature, denotes the number of trials needed until each of the coupon types has been observed at least times. The terminology “double Dixie cup” is often used for the case , while many modern treatments use the same name for the full -set problem (Doumas et al., 2014). The topic occupies a central position in probabilistic combinatorics, asymptotic analysis, and Poissonization-based methods, and has recently been developed at several levels: classical equal-probability asymptotics, unequal-probability generalizations, point-process limits, stopped occupancy formulations, and finite- extremality results for the variance (Ilienko, 2019, Doumas et al., 2014, Long, 28 Apr 2026).
1. Classical definition and basic probabilistic structure
In the standard coupon collector setup, there are coupon types, and on each trial coupon type is drawn independently with probability 0, with 1. The ordinary coupon collector time is 2, the number of trials needed to see every type at least once. The Double Dixie Cup Problem generalizes this to
3
so that 4 is the time to see every coupon twice, and more generally 5 is the time to complete 6 full sets (Doumas et al., 2014).
A closely related formulation appears in the equal-probability setting with 7 coupon types, where one tracks, for each type 8, the time 9 when the 0-th coupon of type 1 arrives. For fixed 2,
3
These variables are identically distributed but not independent, because all coupon types are observed in the same stream of arrivals (Ilienko, 2019).
The equal-probability case corresponds to 4. In that regime, the classical asymptotic scale for completion of 5 full sets is
6
which reduces to 7 for the ordinary collector problem 8 (Doumas et al., 2014). In the notation of (Ilienko, 2019), the analogous centering function for the 9-th arrival level is
0
This shared centering already indicates the structural relation between the coupon collector problem and the Dixie cup problem: the latter introduces the extra correction 1, reflecting the higher arrival multiplicity threshold (Ilienko, 2019).
2. Classical equal-probability asymptotics
For fixed 2, Newman and Shepp proved that
3
with a constant 4. Erdős and Rényi later identified
5
where 6 is the Euler–Mascheroni constant, and established the limit law
7
Equivalently, after shifting by 8, the limit is the standard Gumbel law (Doumas et al., 2014).
The same limit theorem is recovered in the point-process treatment of 9-th arrivals. If 0 denotes the equal-probability completion time, then
1
where 2; for 3 this becomes
4
the familiar Gumbel-type limit (Ilienko, 2019).
Recent work has also clarified the variance asymptotics in the equal-probability case. For fixed 5, (Long, 28 Apr 2026) proves
6
and
7
This recovers the classical 8 case and proves the fixed-9 variance asymptotic for every 0, which Doumas and Papanicolaou had stated as a conjecture (Long, 28 Apr 2026).
A further extension concerns growing multiplicity 1. Defining 2 by
3
with
4
the paper proves the equal-probability Gumbel law
5
where 6 is standard Gumbel, together with
7
3. Poissonization, Erlang representations, and moment formulas
A major tool throughout the subject is Poissonization, which replaces the dependent discrete-time coupon stream by independent continuous-time Poisson processes. In the equal-probability framework of (Ilienko, 2019), coupons arrive at times of a unit-rate Poisson process with i.i.d. uniform marks in 8. For each coupon type 9, the arrivals of that type form an independent rate-0 Poisson process, and the 1-th arrival time 2 satisfies
3
The discrete and poissonized times are coupled by
4
In the unequal-probability setting, (Doumas et al., 2014) introduces independent Poisson processes 5 with rates 6, and lets 7 be the time of the 8-th event in process 9. Then 0 is Erlang with survival function
1
Since the 2’s are independent,
3
where 4 (Doumas et al., 2014).
This Poissonized structure yields explicit integral formulas for moments. The 5-th rising moment is
6
and in particular
7
8
Hence
9
The same Poissonized representation underlies the finite-0 theory in (Long, 28 Apr 2026). There, coupon 1 is assigned an independent Poisson process of rate 2, and the continuous completion time 3 is the time at which every process has reached level 4. Since the waiting time to the 5-th arrival is Gamma/Erlang,
6
is the survival function of the Erlang7 time, and
8
The paper also uses the exact transfer identity
9
so that the Poissonized model controls the discrete completion time through rising moments (Long, 28 Apr 2026).
4. Point-process and functional-limit formulations
A major conceptual shift occurs in the point-process approach of (Ilienko, 2019). Instead of analyzing only the scalar completion time 0, the paper studies the full family of centered and normalized 1-th arrival times,
2
Its poissonized analogue is
3
The main theorem states that if 4 is the Poisson point process on 5 with intensity measure
6
then
7
Thus the centered and normalized 8-th arrival times across coupon types converge to a non-homogeneous Poisson point process with exponential intensity 9 (Ilienko, 2019).
The limit process admits a useful representation. If 00 is a unit-rate Poisson point process on 01, and
02
then
03
This identifies 04 as the image of a homogeneous Poisson process under a logarithmic transformation (Ilienko, 2019).
The point-process limit is stronger than the classical one-dimensional limit and yields infinite-dimensional extensions. Let 05 be the first time when some 06 coupon types have already appeared at least 07 times each, and define
08
Then
09
and Theorem 4.1 gives
10
This is an infinite-dimensional extension of classical limit theorems for the Dixie cup problem (Ilienko, 2019).
The same framework yields a functional limit for rare coupon types. A type 11 is called 12-rare if
13
Let
14
Then 15 converges in 16 with the 17-topology to
18
where 19 is a standard unit-rate Poisson process (Ilienko, 2019).
This process-level formulation suggests that the Double Dixie Cup Problem is naturally interpreted not only as a first-passage problem for a maximum, but also as an extremal point-process problem for the entire cloud of multiplicity-threshold arrival times.
5. Unequal probabilities and heterogeneous coupon populations
A substantial generalization replaces equal sampling probabilities by a positive sequence
20
and defines
21
The Double Dixie Cup Problem then becomes the analysis of 22 under arbitrary positive coupon probability vectors generated by 23 (Doumas et al., 2014).
A key dichotomy in (Doumas et al., 2014) is whether there exists 24 such that
25
This leads to two regimes.
| Regime | Condition | Limiting behavior |
|---|---|---|
| Case I | 26 | Nonuniversal limit depending on 27 |
| Case II | 28 with 29 | Gumbel regime after adapted normalization |
In Case I, the paper defines
30
and
31
Then
32
If 33, the asymptotics are
34
35
and
36
The corresponding limit law is
37
where
38
This limit is not universal and is generally not Gumbel (Doumas et al., 2014).
In Case II, one writes
39
with 40 positive, increasing, smooth, and satisfying
41
together with additional regularity assumptions. Defining
42
the paper proves
43
and
44
Most notably, in Case II the leading variance term is independent of 45 (Doumas et al., 2014).
The limit law in this regime is Gumbel after adapted centering and scaling. With
46
where
47
one has
48
This extends the Erdős–Rényi limit from equal probabilities to broad classes of unequal probabilities (Doumas et al., 2014).
Examples explicitly treated include generalized Zipf laws 49, exponential weights 50, and slow logarithmic decay 51 (Doumas et al., 2014).
6. Interlacing mixtures, stopped occupancy, and maximal counts
The heterogeneous setting is developed further in (Doumas, 29 Oct 2025), which studies an interlacing mixture of two coupon distributions. There, one considers
52
with 53, so the coupon population consists of two subfamilies, each of size 54: a common family 55 and a rare family 56. Their masses are
57
The paper assumes 58 and
59
with 60 positive, increasing, 61, and satisfying the stated derivative conditions (Doumas, 29 Oct 2025).
The central random variable remains
62
the number of trials needed until each of the 63 coupon types has been observed at least 64 times. The standard integral representation is
65
Using Poissonization, the process is decomposed into rare-coupon stages. If 66 is the number of common-family arrivals between successive rare-family arrivals, then
67
and
68
By Wald’s lemma,
69
Hence
70
This identifies a product structure: a mass factor 71 depending on both subfamilies, and a hardness factor determined only by the rare subfamily (Doumas, 29 Oct 2025).
For the rare family,
72
so that
73
up to the specific asymptotics of 74. The paper emphasizes that the parameter 75 does not appear in the leading term as 76 (Doumas, 29 Oct 2025). This suggests that, in this heterogeneous regime, the rarest coupons dominate the asymptotic difficulty regardless of the requested multiplicity.
A different but related reformulation appears in the stopped occupancy model of (Gnedin et al., 25 Jun 2025). Balls are thrown independently into 77 boxes, each with probability 78, and the process stops when only 79 boxes remain that have at most 80 balls: 81 Equivalently,
82
This includes the coupon collector problem as 83, and the Dixie cup problem as 84 (Gnedin et al., 25 Jun 2025).
The paper studies not primarily the stopping time but the maximum occupancy at that time,
85
With
86
one has
87
where 88 has a Gumbel law of order 89. For the maximum, however, there is no single limit distribution. Defining
90
with 91, 92, the paper proves
93
where 94 is standard Gumbel and 95 is independent (Gnedin et al., 25 Jun 2025).
For the Double Dixie Cup setting, this means the maximal occupancy at completion is asymptotically a rounded sum of two independent Gumbels, with oscillations close to periodic on a logarithmic scale. The nonconvergence of 96 as 97 is attributed to the fractional centering term 98, so that convergence occurs only along subsequences with 99 (Gnedin et al., 25 Jun 2025).
7. Variance extremality, terminal defects, and broader uses
A recent finite-00 development is the variance extremality theory of (Long, 28 Apr 2026). For every 01 and 02, among all positive coupon probability vectors
03
the variance of the time 04 to collect 05 complete sets is uniquely minimized at the uniform vector
06
More precisely,
07
with equality iff 08 (Long, 28 Apr 2026).
The paper proves the stronger radial monotonicity statement: if
09
then 10 is strictly increasing for 11. The proof is based on a terminal-defect viewpoint. At time 12, coupon 13 is still defective if it has been seen fewer than 14 times, with defect probability 15, and the expected number of terminal defects is
16
Completion occurs exactly when this count is zero (Long, 28 Apr 2026).
The analytic core of the argument is a monotone-likelihood-ratio comparison derived from a log-scale monotonicity property of the Gamma reverse hazard. If
17
then
18
is strictly negative and strictly decreasing on 19. Consequently 20 is strictly decreasing, and for every 21, the ratio
22
is strictly decreasing (Long, 28 Apr 2026). This is the one-site input supporting the global variance comparison.
The Double Dixie Cup Problem also appears outside its original probabilistic context. In query-based 23-means clustering with same-cluster queries, (Chien et al., 2018) uses it as the combinatorial model for the number of random samples required until every cluster has enough representatives to estimate its centroid. The paper explicitly states: “The double Dixie cup problem is an extension of the classical coupon collector problem in which the collector is required to collect 24 sets of coupons.” In that setting, coupon types correspond to clusters, and obtaining 25 samples from every cluster is the analogue of completing 26 sets (Chien et al., 2018).
Let 27 be the number of sampling rounds needed until each of the 28 cluster-types has been seen at least 29 times. The paper states
30
where
31
Under an 32-imbalance assumption, it derives the bound
33
which feeds directly into the query complexity
34
for the noiseless clustering algorithm (Chien et al., 2018). This application shows that the Double Dixie Cup Problem functions as a reusable probabilistic template whenever “coverage with multiplicity” is the governing bottleneck.
Taken together, these developments show that the Double Dixie Cup Problem is no longer confined to the classical question of expectation asymptotics for equal probabilities. It now comprises a family of models and techniques: exact Poissonized representations, non-homogeneous Poisson point-process limits, asymptotics for unequal and interlaced probability profiles, stopped occupancy extremal statistics, and finite-35 extremality principles (Doumas et al., 2014, Ilienko, 2019, Doumas, 29 Oct 2025, Gnedin et al., 25 Jun 2025, Long, 28 Apr 2026). A plausible implication is that the modern theory views completion not merely as a hitting time, but as an extreme-value and defect-elimination phenomenon whose asymptotic form depends sensitively on the rarity structure of the coupon population.