Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning Curves for Revenue Maximization

Updated 4 July 2026
  • Learning curves for revenue maximization are quantitative measures that track how pricing algorithms converge to optimal revenue as sample size increases.
  • They incorporate various benchmark choices—from fixed-distribution to online regret and pricing-query frameworks—that capture different operational and feedback constraints.
  • The analysis distinguishes regimes ranging from heavy-tailed slow convergence to exponential decay, emphasizing the roles of demand regularity, inventory limits, and strategic responses.

Searching arXiv for the core paper and closely related work on learning curves for revenue maximization. arXiv search: "On the Learning Curves of Revenue Maximization" (Hanneke et al., 29 Apr 2026) Learning curves for revenue maximization quantify how rapidly a pricing or mechanism-design procedure approaches its benchmark revenue as information accumulates. In the single-item, single-buyer posted-pricing model, the canonical object is the expected revenue gap

ϵn(tn,D):=optDE[revD(tn)],\epsilon_n(t_n,D):=\operatorname{opt}_D-\mathbb{E}[\operatorname{rev}_D(t_n)],

viewed as a function of the sample size nn for a fixed valuation distribution DD (Hanneke et al., 29 Apr 2026). In adjacent literatures, the same idea is instantiated through relative regret against a fluid benchmark, dynamic regret against a time-varying oracle, Bayesian regret across episodes, pricing-query complexity, and sample complexity for approximate optimality. The resulting theory shows that the shape of the curve depends on the benchmark, the feedback model, the regularity of demand, whether the optimal price is attained, and whether the seller faces inventory, budget, or strategic-response constraints.

1. Formalizations and benchmark choices

The fixed-distribution formulation isolates a single distribution DD and studies the sequence ϵn(tn,D)\epsilon_n(t_n,D) for an algorithm (tn)(t_n). In this framework, Bayes-consistency means

limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D

for every DD in the class under study, with the convention that if optD=\operatorname{opt}_D=\infty, then E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty (Hanneke et al., 29 Apr 2026). The same paper distinguishes a PAC upper bound, which controls the worst-case envelope uniformly over a class of distributions, from a universal learning rate, which allows the constants to depend on the fixed underlying distribution. This distinction is central because fixed-distribution learning curves can be much sharper than worst-case PAC rates.

Online and dynamic models replace the sample-size axis by a time horizon and define learning curves through regret. In personalized dynamic pricing with an inventory constraint, regret is

nn0

where nn1 is the fluid benchmark under proportional scaling of demand and capacity (Chen et al., 2018). Under nonstationarity with one-point feedback, the benchmark is dynamic:

nn2

with nn3 and path variation

nn4

as the nonstationarity budget (Yang et al., 20 May 2026). In pricing-query models, the learning curve is expressed by the achievable revenue gap after nn5 pricing queries, typically denoted nn6, while in episodic revenue management with unknown time-varying demand, the benchmark is Bayesian regret relative to the clairvoyant dynamic-programming policy over nn7 episodes of length nn8 (Leme et al., 2021, Shimizu et al., 2024).

These benchmark choices are not interchangeable. A fixed-distribution curve measures convergence to optimal revenue for one environment; a regret curve measures cumulative shortfall under sequential decision-making; a pricing-query curve measures how much can be inferred from binary accept/reject observations. Much of the contemporary literature can be read as a comparison of these benchmark families.

2. Distribution-dependent sample-size regimes

The sharpest current characterization of fixed-distribution learning curves is for the basic single-item, single-buyer posted-pricing model (Hanneke et al., 29 Apr 2026). The central structural distinction is whether the optimal revenue is attained at a finite price.

Regime Learning-curve rate Source
nn9, not attained at finite price arbitrarily slow (Hanneke et al., 29 Apr 2026)
DD0 attained at finite DD1 essentially DD2 up to logarithmic factors (Hanneke et al., 29 Apr 2026)
bounded support optimal DD3 universal rate (Hanneke et al., 29 Apr 2026)
closed discrete support DD4 (Hanneke et al., 29 Apr 2026)
finite support DD5 (Hanneke et al., 29 Apr 2026)

For unrestricted distributions, there exists a Bayes-consistent algorithm for all valuation distributions on DD6: capped ERM with cap DD7 achieves DD8 even when DD9 (Hanneke et al., 29 Apr 2026). However, if DD0 but no finite price attains it, then convergence can be arbitrarily slow. The lower bound holds in a strong sense: for any algorithm DD1 and any rate function DD2, there exists a fixed distribution DD3 such that infinitely often

DD4

for some constant DD5 (Hanneke et al., 29 Apr 2026). The heavy-tail example

DD6

illustrates the regime DD7 without attainment (Hanneke et al., 29 Apr 2026).

When the optimal revenue is achieved at a finite price DD8, the universal rate becomes essentially DD9. More precisely, for any rate ϵn(tn,D)\epsilon_n(t_n,D)0 there is an algorithm that learns the class at universal rate ϵn(tn,D)\epsilon_n(t_n,D)1, whereas for any ϵn(tn,D)\epsilon_n(t_n,D)2 no algorithm can learn the class universally at that rate (Hanneke et al., 29 Apr 2026). On bounded supports, ERM improves this to an optimal ϵn(tn,D)\epsilon_n(t_n,D)3 universal rate by a localized Bernstein analysis. On closed discrete supports, structured ERM attains ϵn(tn,D)\epsilon_n(t_n,D)4 rates, while vanilla ERM is not Bayes-consistent: there exists a closed discrete support on which ERM incurs a constant expected revenue gap along an infinite subsequence of sample sizes (Hanneke et al., 29 Apr 2026). On finite supports, ERM achieves

ϵn(tn,D)\epsilon_n(t_n,D)5

and no faster-than-exponential universal rate is possible on nontrivial finite supports (Hanneke et al., 29 Apr 2026).

A recurring conclusion is that PAC-style worst-case envelopes obscure these shapes. The fixed-distribution view separates heavy-tail nonattainment, finite-price attainability, bounded support, closed discrete support, and finite support into genuinely different rate classes.

3. Inventory, resource constraints, and contextual decision-making

A large segment of the literature studies learning curves in revenue maximization under operational constraints rather than pure posted-pricing estimation. In personalized dynamic pricing with one inventory resource and ϵn(tn,D)\epsilon_n(t_n,D)6 observable customer types, the seller learns a single dual shadow price ϵn(tn,D)\epsilon_n(t_n,D)7 instead of learning all type-specific demand functions in full (Chen et al., 2018). The fluid benchmark is

ϵn(tn,D)\epsilon_n(t_n,D)8

and the dual variable ϵn(tn,D)\epsilon_n(t_n,D)9 satisfies

(tn)(t_n)0

The primal-dual learning algorithm achieves the dimension-free regret rate

(tn)(t_n)1

with the exponent independent of the number of types (tn)(t_n)2 (Chen et al., 2018). Under sufficient capacity, the final phase uses one price per type; under insufficient capacity, it uses two prices per type and an interpolation parameter (tn)(t_n)3 to pin aggregate sales near (tn)(t_n)4.

In the older single-product, limited-inventory model with unknown regular demand, the dynamic pricing algorithm of Besbes and Zeevi uses shrinking price intervals and function-value estimation rather than parametric identification (Wang et al., 2011). In the size-(tn)(t_n)5 scaling regime, it achieves

(tn)(t_n)6

while a lower bound shows that no admissible policy can beat order (tn)(t_n)7 up to logarithmic factors (Wang et al., 2011). The paper interprets this as closing the gaps between parametric and non-parametric learning and between a post-price mechanism and a customer-bidding mechanism.

Constraint-coupled contextual revenue maximization introduces a different learning curve. In dual mirror descent with unknown model parameter (tn)(t_n)8, the agent observes i.i.d. contexts (tn)(t_n)9, chooses limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D0, receives revenue limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D1, and incurs costs limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D2 subject to both lower and upper average-cost bounds (Lobos et al., 2021). With known limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D3, regret and lower-bound violations are both limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D4. With unknown limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D5, the decomposition

limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D6

adds an estimation term that depends on

limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D7

If limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D8, the overall regret remains limnE[revD(tn)]=optD\lim_{n\to\infty}\mathbb{E}[\operatorname{rev}_D(t_n)]=\operatorname{opt}_D9; if it decays as DD0, the bound becomes DD1 (Lobos et al., 2021).

These results share a common pattern: the learning curve is not only a function of statistical difficulty, but also of how a low-dimensional structure—such as a dual variable, a shrinking interval, or a dual feasibility certificate—compresses the constrained control problem.

4. Nonstationarity and time-varying demand

When demand changes over time, learning curves acquire an explicit dependence on a variation budget. Under one-point feedback in a convex price domain DD2, mirror ascent with the estimator

DD3

and periodic restarting yields a static regret bound

DD4

(Yang et al., 20 May 2026). With tuned DD5 and DD6, this becomes DD7, and for spherical smoothing DD8 it reduces to DD9. Restarting converts this into a dynamic regret bound

optD=\operatorname{opt}_D=\infty0

with the special case

optD=\operatorname{opt}_D=\infty1

when optD=\operatorname{opt}_D=\infty2 and optD=\operatorname{opt}_D=\infty3 is known. If optD=\operatorname{opt}_D=\infty4 is unknown, the bandit-over-bandit meta-layer yields

optD=\operatorname{opt}_D=\infty5

for optD=\operatorname{opt}_D=\infty6 (Yang et al., 20 May 2026).

In the single-buyer binary-feedback model with drifting valuations, the benchmark is first-best revenue

optD=\operatorname{opt}_D=\infty7

and the regret is

optD=\operatorname{opt}_D=\infty8

Here the nonstationarity parameter is either a fixed optD=\operatorname{opt}_D=\infty9, the average changing rate

E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty0

or

E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty1

in the stochastic known-E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty2 case (Leme et al., 2021). The optimal exponents differ between adversarial and stochastic drift:

E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty3

with matching lower bounds, and the same exponents extend to unknown or dynamic non-increasing E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty4 through E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty5 (Leme et al., 2021). The algorithms alternate binary-search localization with exploitation phases and sparse checking rounds.

Episodic revenue management with unknown time-varying demand introduces a further interaction between learning and inventory. With E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty6 seasons, E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty7 periods per season, a finite price set E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty8, and a Bayesian prior over time-varying demand parameters, posterior sampling plus LP re-optimization yields

E[revD(tn)]\mathbb{E}[\operatorname{rev}_D(t_n)]\to\infty9

while a lower bound shows

nn00

for some prior nn01 (Shimizu et al., 2024). The paper interprets the nn02 term as the unavoidable learning component and the nn03 term as the computational-efficiency cost of using LP rather than exact dynamic programming. Empirically, correlated Gaussian-process priors steepen early learning curves by sharing information across time and price (Shimizu et al., 2024).

Taken together, these models replace a single asymptotic rate by a two-parameter geometry: horizon length controls the accumulation of exploration error, while a variation budget controls how rapidly past information becomes stale.

5. Feedback limitations, strategic responses, and information complexity

Learning curves also depend on what information the seller receives. In pricing-query models, the learner posts a price nn04, observes only the binary signal

nn05

and seeks a reserve price nn06 that nearly maximizes

nn07

(Leme et al., 2021). The resulting query complexity exhibits three regimes:

nn08

Equivalently, the revenue-gap learning curve satisfies

nn09

for general distributions and

nn10

for regular and MHR distributions (Leme et al., 2021). The regular-distribution algorithm relies on the relative flatness property, which rules out hidden interior spikes after probing a constant number of evenly spaced prices. The same paper shows that for regular distributions, learning the reserve is strictly easier than learning the entire distribution in Lévy distance:

nn11

whereas reserve-price learning requires only nn12 queries (Leme et al., 2021).

Patient buyers create a different information geometry. When each buyer has a value-patience type nn13 and can delay purchase over up to nn14 steps, the revenue class for pure non-increasing price sequences has fat-shattering dimension linear in nn15 (Mashiah et al., 2022). The offline pure-strategy sample complexity has two regimes: for sample size nn16 the excess error behaves like nn17, whereas for nn18 it behaves like nn19 up to logarithmic factors. In online learning, regret against the optimal pure strategy is

nn20

after the crossover nn21, while regret against the optimal mixed strategy is nn22 for finite support and nn23 in general (Mashiah et al., 2022).

Strategic data generation further modifies the curve. In ERM with endogenous sampling, the “samples” are buyers’ bids, and a coalition of size nn24 can manipulate the learned reserve. The paper formalizes this through the incentive-awareness measure

nn25

which upper-bounds the expected relative price drop caused by altering nn26 out of nn27 samples (Deng et al., 2020). For guarded ERM,

nn28

for MHR distributions and

nn29

for distributions supported on nn30 (Deng et al., 2020). The endogenous-learning curve therefore becomes the sum of the classical statistical error and a manipulation-robustness term:

nn31

Two-sided learning against a budget- and ROI-constrained buyer supplies a further example of structure-driven learnability. The seller’s fixed-price revenue

nn32

is bell-shaped over the price grid: strictly increasing on the non-binding region, flat at nn33 on the budget-binding segment, and strictly decreasing on the ROI-binding segment (Golrezaei et al., 2021). An episodic binary search then achieves

nn34

where nn35 is the buyer’s within-episode adaptivity exponent and nn36 (Golrezaei et al., 2021). If the buyer best responds exactly, or uses empirical-distribution advice, then nn37 and the seller’s regret is nn38.

6. Algorithmic motifs, applications, and open directions

Several recurring design principles organize the field. Capped ERM and structured ERM control tail risk or discrete-support pathologies in fixed-distribution learning (Hanneke et al., 29 Apr 2026). Primal-dual methods reduce a high-dimensional pricing problem to learning a shadow price of capacity or cost (Chen et al., 2018, Lobos et al., 2021). Restarting and meta-learning discount stale information under nonstationarity (Yang et al., 20 May 2026). Relative flatness enables zoom-in without reconstructing the full value distribution (Leme et al., 2021). Posterior sampling combined with LP re-optimization transforms episodic time-varying revenue management into tractable mean-demand planning (Shimizu et al., 2024).

These motifs extend beyond single posted prices. In airline revenue management with unconstrained capacity and simultaneous pricing of nn39 active flights, the unified objective

nn40

balances current expected revenue against Fisher-information-driven learning quality (Pinheiro et al., 2022). In the reported experiments, with nn41, 10 prices from \$n$42230, and a sweep over 160 values of $n$43, the best choice $n$44 achieved normalized expected revenue of $n$45 versus $n$46 for RMS, an absolute improvement of $n$47 (Pinheiro et al., 2022). The same paper reports that the MSE of $n$48 estimation decreases monotonically with $n$49 up to a sweet spot, after which over-exploration reduces revenue.

Revenue-maximizing ranking under random attention spans supplies another example of approximation-oriented learning curves. For random span $n$50 with tail $n$51, the Best-$n$52 policy selects

$n$53

where $n$54 is the fixed-span optimum (Chen et al., 2020). Under the IFR condition on attention spans, Best-$n$55 achieves at least $n$56 of the clairvoyant benchmark, while no algorithm can exceed $n$57 of that benchmark in the worst case (Chen et al., 2020). In the contextual online version, RankUCB attains expected regret of order $n$58 relative to the scaled $n$59 benchmark despite censoring of the attention span (Chen et al., 2020).

The move from posted prices to richer mechanisms preserves the learning-curve perspective but increases parameter dependence. For menus of two-part tariffs, adversarial full-information online regret is

$n$60

while fixed-length lottery menus admit

$n$61

full-information regret and

$n$62

bandit regret (Balcan et al., 2023). In the distributional setting, fixed-length lottery menus have sample complexity

$n$63

whereas arbitrary-length menus exhibit the expected exponential dependence on $n$64 under correlated valuations (Balcan et al., 2023). The same paper shows that dispersion-based methods, successful for smoothed online learning of two-part tariffs, are inadequate for menus of lotteries because dispersion can fail at the optimizer (Balcan et al., 2023).

The open problems are correspondingly structural. One line asks when fixed-distribution curves can be sharpened beyond current worst-case universal rates, especially for unbounded supports and for ERM outside bounded-support or finite-support settings (Hanneke et al., 29 Apr 2026). Another asks whether dimension-free final-phase guarantees survive multiple resource constraints in primal-dual inventory control (Chen et al., 2018). In patient-buyer models, the gap between upper and lower bounds for mixed strategies and the possibility of $n$65 regret against optimal mixed menus remain unresolved (Mashiah et al., 2022). In episodic time-varying revenue management, a formal regret analysis for dynamic per-period posterior sampling and a theory that quantifies the gains from correlated priors remain open (Shimizu et al., 2024).

Across these domains, the central lesson is not that revenue learning has a single canonical asymptotic law, but that it decomposes into sharply different regimes. Heavy tails without attainment can force arbitrarily slow convergence; finite optimal prices produce essentially $n$66 curves; discrete supports can yield almost exponential or exponential decay; nonstationarity introduces explicit variation exponents; and resource, feedback, or strategic constraints add separate structural terms that often dominate the statistical component.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Learning Curves for Revenue Maximization.