Minimum-Ratio Estimator

Updated 15 November 2025

Minimum-ratio estimator is a method for estimating ratio parameters (e.g., prevalence, relative risk) with minimax optimality and controlled mean-square error.
It employs score functions, optimization techniques, and auxiliary information to address challenges like prior-probability shift and finite-population inference.
Extensions include applications in relative risk estimation and regression-ratio frameworks, ensuring unbiasedness and efficiency even under stringent error constraints.

A minimum-ratio estimator is a statistical construction designed to estimate a population parameter expressed as a ratio (e.g., prevalence, relative risk, odds ratio), with properties of minimax optimality, control over mean-square error, and—in some cases—guaranteed efficiency relative to the Cramér–Rao lower bound. These estimators appear in contexts such as quantification under prior-probability shift, finite-population inference with auxiliary variables, and design-based estimation of relative risk with controlled precision and resource allocation. The methodology is unified by the use of ratio forms and optimization over risk or mean-square error subject to specified constraints.

1. Ratio Estimators Under Prior-Probability Shift

When only a labeled source sample and an unlabeled target sample are available, quantification under prior-probability shift seeks estimation of the target prevalence $\pi_t = P_{\mathrm{target}}(Y=1)$ . The prior-shift assumption, $P_{\mathrm{train}}(X|Y) \equiv P_{\mathrm{target}}(X|Y)$ , entails that the mixture distribution of features in the target is a convex combination of class-conditional densities $f_0(x),f_1(x)$ with mixture weights $\pi_t$ .

The ratio estimator is constructed as: $\hat \pi_{UR} = \frac{\mu_t - \mu_0^s}{\mu_1^s - \mu_0^s}, \qquad \mu_0^s = \frac{1}{n_{s,0}} \sum_{i:Y_i=0} g(X_i),\;\; \mu_1^s = \frac{1}{n_{s,1}} \sum_{i:Y_i=1} g(X_i),\;\;\mu_t = \frac{1}{n_t} \sum_{j=1}^{n_t} g(X_j)$ for a scoring function $g$ satisfying $\Delta = \mathbb{E}[g(X)|Y=1]-\mathbb{E}[g(X)|Y=0] \ne 0$ . This estimator achieves approximate minimax optimality in $\ell_2$ -risk over separable parameter classes, with the worst-case risk scaling as $O(\max(n_s^{-1}, n_t^{-1}))$ , matching the established lower bound. Trimmed forms $\hat \pi_R = \min\{1,\max\{0,\hat \pi_{UR}\}\}$ are standard.

With additional knowledge of class-conditional densities, the estimator can be recast using importance weights $w(x)=f_1(x)/f_0(x)$ : $\pi_t = \frac{\frac{1}{n_t}\sum_{j} w(X_j) - 1}{\theta^{-1} - 1},\;\;\theta = \mathbb{E}[w(X)|Y=1]^{-1}$ Several classical approaches (adjusted count, EM prior-shift correction) are nested in this framework (Vaz et al., 2018).

2. Minimum-Ratio Estimation With Controlled Relative Error

In two-sample binary experiments, the minimum-ratio estimator constructs the estimate of relative risk $R=p_1/p_2$ ensuring the relative mean-square error (RMSE) is bounded by a target $\varepsilon^2$ for all $p_1,p_2\in(0,1)$ , and average sample sizes are in a user-specified ratio $\rho$ . This is achieved by a two-stage inverse binomial sampling protocol:

Stage I (pilot): For each population $i=1,2$ , sample until $r_i$ successes; record number of draws $M_i$ . Compute preliminary ratio $X = (M_2 - \tfrac12)/(M_1 - \tfrac12)$ .
Stage II (main): Determine $(s_1, s_2)$ to satisfy

$e(s_1,s_2) = \frac{1}{s_1-2} + \frac{1}{s_2} + \frac{1}{(s_1-2)s_2} = \varepsilon^2,\;\; \frac{r_1 + s_1}{r_2 + s_2} = \rho\;$

and sample until $s_i$ further successes (per population). The minimum-ratio (RR) estimator is then

$\hat R = \frac{s_1-1}{s_2} \frac{N_2}{N_1-1}$

where $N_i$ is the number of draws to reach $s_i$ successes in population $i$ .

This estimator is unbiased, and achieves $\mathbb{E}[(\hat R - R)^2]/R^2 < \varepsilon^2$ . For small $\varepsilon$ , its efficiency approaches the Cramér–Rao lower bound. Batch (group) sampling operates identically, with cost adjustments for incomplete batch use (Mendo, 6 Mar 2025).

3. Ratio-Product Estimators With Auxiliary Information

For finite populations, the two-parameter ratio-product-ratio estimator uses both sample and population information from auxiliary variable $x_i$ to estimate the mean of paper variable $y_i$ : $\hat Y_{\alpha,\beta} = \alpha \left[\frac{(1-\beta)\bar x + \beta X}{\beta \bar x + (1-\beta) X}\right]\bar y + (1-\alpha)\left[\frac{\beta \bar x + (1-\beta) X}{(1-\beta)\bar x + \beta X}\right]\bar y$ Here $\alpha,\beta\in\mathbb{R}$ are tuning parameters; $\bar y, \bar x$ denote sample means, $X$ the known population mean of $x_i$ . Specializations yield the sample mean, classical ratio, and product estimators for certain $(\alpha,\beta)$ .

A first-order expansion yields bias and MSE expressions in terms of population variances $S_Y^2$ , $S_X^2$ , and correlation $\rho$ . The MSE is minimized along the curve $(1-2\alpha)(1-2\beta) = \rho$ , yielding the minimum-MSE estimator: $\min_{\alpha,\beta} \operatorname{MSE}_1(\hat Y_{\alpha,\beta}) = \frac{1-f}{n} S_Y^2 (1-\rho^2)$ For strongly correlated auxiliary information, empirical results show the minimum-ratio estimator significantly outperforms traditional approaches (Chami et al., 2012).

4. Extensions and Robustness

Multiple generalizations are available within the minimum-ratio estimator framework:

Combined estimator: When limited labeled target data are available, the ratio estimator can be linearly combined with plug-in estimators, using weights determined by their MSEs. This blend is provably optimal in MSE for the convex mixture (Vaz et al., 2018).
Regression-ratio estimator: For covariate-dependent prevalences $\pi_t(z)$ , the ratio estimator extends under extra conditional independence, permitting estimation via nonparametric regression fits of $E[g(X)|S=0, Z=z]$ . Consistency and convergence rates are characterized under mild regularity (Vaz et al., 2018).
Group sampling: In two-sample experiments, minimum-ratio sampling by group or batch is handled by simulating incremental sampling, and the performance guarantees persist with negligible overshoot variation (Mendo, 6 Mar 2025).

Crucially, weaker, empirically testable variants of the core assumptions (e.g., weak prior shift: $g(X) \perp S | Y$ ) suffice for consistency and confidence interval construction. Verification procedures based on the convex hull relationship of CDFs support practical deployment.

5. Optimality and Performance

For quantification, the minimax lower bound for mean squared error cannot improve upon $O(\max(n_s^{-1}, n_t^{-1}))$ , as established by Bayes and Le Cam type arguments. The ratio estimator matches this rate, and hence is approximately minimax. For relative risk, the minimum-ratio estimator is unbiased and has relative mean-square error strictly bounded by $\varepsilon^2$ everywhere; for small error requirements, its efficiency (variance relative to the Cramér–Rao lower bound) converges to unity (Mendo, 6 Mar 2025).

For auxiliary-based population mean estimation, the minimum-MSE estimator achieves the classical regression-type optimum, with the empirical MSE and coverage of confidence intervals demonstrating substantial relative efficiency gains over sample mean, ratio, or product estimators (Chami et al., 2012).

6. Algorithmic Construction and Implementation

The minimum-ratio estimator methodology is characterized by systematic algebraic derivations determining parameters—either tuning $(\alpha,\beta)$ , or sequential stopping rules $(r_i, s_i)$ —to meet targeted precision. For relative-risk, the construction proceeds via:

Solving for pilot-stage sizes ( $r_1, r_2$ ) from curvature conditions.
Computing $(s_1, s_2)$ to exactly achieve the mean-square error and allocation targets.
Executing sampling and estimator calculations accordingly.

For quantification, the workflow is:

Compute class means of the score function $g$ over labeled and target samples.
Form the ratio of means estimator (possibly trimmed).
Combine with plug-in estimators as needed or incorporate regression for prevalence as function of covariates.

The explicit forms support straightforward algorithmic implementation, including for large-scale or streaming contexts, and the underlying variance control properties remain valid under a wide range of scenarios due to their non-reliance on asymptotics or strong distributional assumptions.

7. Applications and Empirical Performance

Minimum-ratio estimators are applied in diverse contexts, such as:

Quantification of prevalence in target populations under distribution shift (e.g., sentiment analysis domain adaptation) (Vaz et al., 2018).
Design-based survey estimation with auxiliary variables, especially when high correlation allows for dramatic MSE reduction (Chami et al., 2012).
Clinical and epidemiological studies requiring strict control on the error in estimation of relative risk, regardless of underlying proportions (Mendo, 6 Mar 2025).

Empirical validation demonstrates substantial improvements in MSE and coverage, robustness to misspecification of underlying distributions (given testable assumptions), and scalability to high-throughput sampling schemes.

The development and analysis of minimum-ratio estimators unify minimax, unbiasedness, and efficiency properties across quantification, finite-population estimation, and two-sample problems. Their construction enables both rigorous statistical guarantees and practical deployability in heterogeneous data regimes.

PDF Markdown Chat (Pro)

References (3)

Quantification under prior probability shift: the ratio estimator and its extensions (2018)

Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio (2025)

A two parameter ratio-product-ratio estimator using auxiliary information (2012)

Follow Topic

Get notified by email when new papers are published related to Minimum-ratio Estimator.