Papers
Topics
Authors
Recent
2000 character limit reached

Discount Model Search (DMS)

Updated 10 January 2026
  • Discount Model Search (DMS) is a suite of optimization frameworks that learn and model discount functions to enhance decision-making across diverse domains.
  • In quality diversity optimization, DMS employs neural networks and hybrid models to generate smooth improvement signals and achieve high-dimensional exploration.
  • In ranking and economic models, DMS leverages convex programming and analytic methods to infer parameters that align evaluation metrics with user preferences and policy constraints.

Discount Model Search (DMS) encompasses a family of optimization and inference frameworks focused on estimating, modeling, or learning discount functions or discount parameters that drive decision-making, evaluation, or archive-update processes across disparate domains including quality diversity optimization, social choice theory, and ranking systems. Recent advances under this term link core algorithmic concepts from quality diversity via smooth, model-based improvement ranking to learning-theoretic approaches for inferring discount parameters from data, culminating in flexible, high-dimensional exploration and robust preference-aligned evaluation.

DMS originated in the context of resolving limitations in traditional, histogram-based quality diversity (QD) algorithms and extends to the principled estimation of discount-related parameters in ranking systems and economic models. The overarching aim is to generate discount models—function-valued, continuous, or parameterized representations—that replace or generalize fixed, discrete discount structures, enabling improved performance and adaptability. Salient variants include neural discount models for QD archives (Tjanaka et al., 3 Jan 2026), linear/quadratic models for ranking metrics (&&&1&&&), and analytically derived discount curves in consumption-based social discounting (Gluzberg et al., 2018).

2. DMS in Quality Diversity Optimization

The DMS algorithm in QD explores a continuous, high-dimensional measure space μ:RnRk\mu:\mathbb{R}^n \to \mathbb{R}^k by learning a smooth discount function D(μ;ψ)D(\mu; \psi), typically implemented as a neural network. In contrast to histogram-based approaches such as CMA-MAE, which suffer from "cell collision" effects and stagnation in high kk, DMS enables distinct discount values for nearby (even overlapping) measures, thus providing informative improvement signals and supporting further exploration.

Given candidate solution xx, objective f(x)f(x), and measure μ(x)\mu(x), DMS computes the improvement signal as

Δ(x)=f(x)D(μ(x);ψ).\Delta(x) = f(x) - D(\mu(x); \psi).

The key learning target is set by a smoothed update: tA={D(old μ;ψ)if f(x)D(old μ;ψ) (1α)D(old μ;ψ)+αf(x)if f(x)>D(old μ;ψ)t_A = \begin{cases} D(\text{old }\mu;\psi) & \text{if } f(x) \leq D(\text{old }\mu;\psi) \ (1-\alpha) D(\text{old }\mu;\psi) + \alpha f(x) & \text{if } f(x) > D(\text{old }\mu;\psi) \end{cases} where α[0,1]\alpha \in [0,1] is a learning rate for archive updates. Pairwise (μi,tA,i)(\mu_i, t_{A,i}) are collected from sampled solutions and empty archive cells (with target fminf_{\min}). The model minimizes mean squared error

E(μ,t)DA[(D(μ;ψ)t)2]+λψ2\mathbb{E}_{(\mu,t)\sim \mathcal{D}_A}[(D(\mu;\psi)-t)^2] + \lambda\|\psi\|^2

via gradient-based optimization. DMS operates in rounds; the model is queried in the search phase, then updated post-round using accumulated data.

Benchmarks indicate that DMS outperforms CMA-MAE, DDS, and MAP-Elites baselines with respect to QD Score and Coverage in both classic and high-dimensional (image-based) tasks. For instance, in 10D Linear Projection (Sphere), DMS achieves 89% coverage vs. CMA-MAE’s 7% (Tjanaka et al., 3 Jan 2026).

3. Learning Gain Values and Discount Factors in Ranking Systems

A structurally analogous methodology appears in document ranking via DCG (Discounted Cumulative Gain), where the specific choice of gain values (gig_i) and discounts (djd_j) substantially influences ranking evaluation. DMS for DCG casts the metric as a linear utility function and learns its parameters via convex quadratic programming from user-elicited pairwise preferences.

Each ranking π\pi is encoded as a KLK \cdot L binary vector s(π)s(\pi), with the utility function u(π)=wTs(π)u(\pi) = w^T s(\pi) and weights wj,i=djgiw_{j,i} = d_j g_i for position jj, grade ii. Preferences πaπb\pi_a \succ \pi_b are translated to wT(s(πa)s(πb))1ξabw^T(s(\pi_a) - s(\pi_b)) \geq 1 - \xi_{ab} with slack variables ξ\xi and regularization. Position- and grade-wise monotonicity is enforced. The solution is obtained via standard convex QP solvers.

Simulations confirm that as the number of preference pairs increases (20 to 200), the estimated parameters align closely with ground-truth, and test set precision approaches 95% (Zhou et al., 2012). Singular value decomposition enables separation and recovery of gains and discounts from the learned weight matrix.

4. Discount Function Modeling in Social Choice and Consumption Growth

Discount Model Search terminology extends to analytic modeling of time-dependent discount rates in economic contexts. The logistic consumption growth model captures a decelerating growth curve constrained by planetary resource limits: dCdt=g0C(t)[1C(t)/Cmax]\frac{dC}{dt} = g_0 C(t) [1 - C(t)/C_{\max}] with CmaxC_{\max} as carrying capacity. Incorporating stochastic growth rate fluctuations via a zero-mean process ξ(t)\xi(t), the resulting term structure for the social discount rate r(t)r(t), under isoelastic utility, is

r(t)δ+η[g0eg0t1+(1/a1)eg0tσ2τ23x3ln[a+(1a)ex]x=g0t/2]r(t) \approx \delta + \eta \left[ \frac{g_0 e^{-g_0 t}}{1+(1/a - 1)e^{-g_0 t}} - \frac{\sigma^2 \tau}{2} \frac{\partial^3}{\partial x^3} \ln \bigl[a + (1-a) e^{-x}\bigr]_{x=g_0 t} / 2 \right]

where δ\delta, η\eta control impatience and inequality aversion. This model parametrizes feedback effects and "precautionary" corrections, yielding a declining long-run social discount rate dominated asymptotically by δ\delta (pure time preference), which has implications for long-horizon valuation and policy (Gluzberg et al., 2018).

5. Theoretical and Computational Properties

DMS frameworks improve upon earlier histogram or parameter-fixed approaches according to several criteria:

  • Resolution of Distortion: Continuous DMS models avoid cell-based value collisions and loss of improvement signal in high-dimensional archives (Tjanaka et al., 3 Jan 2026).
  • Memory and Scalability: Neural or linear models scale with (hyper)parameters rather than exponentially with measure or discount dimension.
  • Computational Complexity: In QD, dominant costs are inherited from emitter sampling and archive assignment; model fitting adds 10–30% wall-clock time in typical scenarios. Ranking-based DMS is compatible with efficient SVM/RankSVM solvers; complexity is polynomial in the number of constraints and variables (Zhou et al., 2012).
  • Robustness: Empirical studies show graceful degradation under noise, stability under synthetic/real preference variation, and statistically validated improvements over baselines (Tjanaka et al., 3 Jan 2026, Zhou et al., 2012).

6. Applications, Extensions, and Limitations

DMS methods in QD have enabled new domains, notably QDDM, in which high-dimensional datasets (e.g., images) define the measure space, bypassing the need for hand-engineered low-dimensional descriptors. In information retrieval, DMS enables direct alignment of evaluation metrics with user preferences rather than ad hoc discount/gain selection. In social discounting, it provides a compact, theoretically grounded three-parameter kernel capturing planetary-limit effects.

Limitations include increased training time for nontrivial discount models, potential noise in improvement ranking (especially where objective accuracy is paramount), and the need for generalization when underlying constraints or user utility functions differ from model assumptions.

Future work proposed for DMS encompasses advanced architectures (CNNs, Transformers), alternative loss and regularization schemes (e.g., smoothness penalties), large-scale nearest-neighbor methods for archive search, and integration into differentiable optimization pipelines (Tjanaka et al., 3 Jan 2026). In preference modeling, active learning and nonlinear or kernelized utility representations remain open directions (Zhou et al., 2012).


Key Papers:

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Discount Model Search (DMS).