Sigmoid-FTRL: Adaptive ATE Estimation

Updated 26 November 2025

Sigmoid-FTRL is an adaptive online design strategy that minimizes Neyman regret for average treatment effect estimation using AIPW estimators.
It decomposes a nonconvex variance minimization problem into two convex learning tasks solved via FTRL updates over treatment probabilities and linear predictors.
The method achieves asymptotic optimality and enables valid inference through consistently conservative variance estimation and adaptive ridge regression.

Sigmoid-FTRL is an adaptive online experimental design strategy for minimizing variance (Neyman regret) in the estimation of average treatment effects using Augmented Inverse Probability Weighting (AIPW) estimators, explicitly within the design-based potential outcomes framework where both outcomes and covariates are deterministic. The method unifies online convex optimization and adaptive Neyman allocation via a decomposition of a nonconvex variance-minimization problem into two convex online learning problems, efficiently addressed through Follow-the-Regularized-Leader (FTRL) updates over both treatment probabilities and linear predictors. Sigmoid-FTRL establishes asymptotic optimality, supports consistently conservative variance estimation, and enables construction of valid confidence intervals under broad regularity conditions (Chen et al., 25 Nov 2025).

1. Design-Based Setting and Problem Formulation

Consider $T$ observed units indexed by $t = 1, \ldots, T$ , each with a covariate vector $x_t \in \mathbb{R}^d$ bounded in norm ( $\|x_t\| \leq R$ ) and deterministic potential outcomes $y_t(1), y_t(0) \in \mathbb{R}$ . The goal is estimation of the average treatment effect (ATE),

$\tau := \frac{1}{T}\sum_{t=1}^T[y_t(1) - y_t(0)]$

using only randomized assignment. At each round $t$ , the procedure selects:

Assignment probability $p_t \in (0,1)$ as a function of history $\mathcal{F}_{t-1}$
Linear predictor coefficients $\beta_t(1), \beta_t(0) \in \mathbb{R}^d$

Treatment $Z_t \sim \mathrm{Bernoulli}(p_t)$ is randomized, generating observed outcome $Y_t = Z_t y_t(1) + (1-Z_t)y_t(0)$ . For each arm $k \in \{0,1\}$ , online ridge-regression is used to fit

$\beta_t(k) \approx \arg\min_\beta \sum_{s<t} 1[Z_s = k]\frac{(Y_s - x_s^\top \beta)^2}{\mathbb{P}(Z_s = k)} + n_t^{-1}\|\beta\|^2$

with $n_t$ an adaptive regularization parameter.

The adaptive AIPW estimator is

$\hat{\tau} = \frac{1}{T} \sum_{t=1}^T \left\{ x_t^\top[\beta_t(1) - \beta_t(0)] + \frac{1[Z_t=1](Y_t - x_t^\top \beta_t(1))}{p_t} - \frac{1[Z_t=0](Y_t - x_t^\top \beta_t(0))}{1-p_t} \right\}$

which is unbiased, and whose variance (and thus regret relative to the oracle design) admits closed-form analysis (Chen et al., 25 Nov 2025).

2. Neyman Regret and Oracle Design Benchmark

The “oracle” nonadaptive design fixes both linear predictors $\beta^*(k)$ (by armwise OLS on all $T$ units) and probability $p^*$ , minimizing the expected variance:

$p^* = \left(1 + \frac{\sigma_0}{\sigma_1}\right)^{-1}$

where $\sigma_k^2$ is the residual variance for potential outcomes under arm $k$ .

The benchmark variance is

$V^* = \frac{2(1 + \rho)\sigma_1\sigma_0}{T}, \quad \rho = \text{corr}(\text{residuals})$

The Neyman regret of any adaptive policy $\Pi$ is

$\operatorname{Reg}_{\text{Neyman}}(\Pi) = T\,\operatorname{Var}_\Pi(\hat{\tau}) - T\,V^*$

highlighting the additional variance incurred by adaptation relative to the nonadaptive oracle (Chen et al., 25 Nov 2025).

3. Algorithmic Formulation: Decomposition and Convexification

Direct minimization of Neyman regret is nonconvex in the triple $(p, \beta(1), \beta(0))$ . Sigmoid-FTRL circumvents this by decomposing the regret into two convex sequences:

Probability Regret: For fixed predictors,

$f_t(p) = \frac{(y_t(1)-x_t^\top\beta_t(1))^2(1-p) + (y_t(0)-x_t^\top\beta_t(0))^2p}{p(1-p)}$

and

$R_{\mathrm{prob}} = \mathbb{E}\left[\sum_{t=1}^T (f_t(p_t) - f_t(p^*))\right]$

with $p \mapsto f_t(p)$ convex on $(0,1)$ .

Prediction Regret: For fixed $p$ ,

$\ell_t(\beta(1), \beta(0)) = \frac{(y_t(1)-x_t^\top\beta(1))^2}{\sigma_1} + \frac{(y_t(0)-x_t^\top\beta(0))^2}{\sigma_0}$

and

$R_{\mathrm{pred}} = \mathbb{E}\left[\sum_{t=1}^T (\ell_t(\beta_t(1), \beta_t(0)) - \ell_t(\beta^*(1), \beta^*(0)))\right]$

which is jointly convex in $(\beta(1), \beta(0))$ .

Lemma 3.3 asserts that

$\operatorname{Reg}_N = R_{\mathrm{prob}} + R_{\mathrm{pred}}$

enabling separate convex-optimizable updates (Chen et al., 25 Nov 2025).

4. Sigmoid-FTRL Mechanism

The algorithm maintains parameter $u_t \in \mathbb{R}$ such that $p_t = \phi(u_t)$ for a differentiable sigmoid $\phi : \mathbb{R} \rightarrow (0,1)$ with properties: monotonicity, $\phi(-u) + \phi(u) = 1$ , as well as specific convexity and derivative decay conditions. Examples include $\phi(u) = \frac{1}{2} + \arctan(u)/\pi$ or $\phi(u) = u/(1+|u|)$ .

For probability updates:

Define $h_t(u) := f_t(\phi(u))$ .
Use FTRL with regularizer $r(u) = u^2 + |u|^3$ :

$u_t = \arg\min_{u \in \mathbb{R}} \left\{\sum_{s<t} \hat{h}_s(u) + n_t r(u)\right\}$

where $\hat{h}_s(u)$ is an IPW-estimator of $h_s(u)$ .

For linear predictor updates:

For each arm, solve

$\beta_t(k) = \arg\min_{\beta} \left\{ \sum_{s<t} 1[Z_s = k] \frac{(Y_s - x_s^\top \beta)^2}{\mathbb{P}(Z_s = k)} + n_t^{-1} \|\beta\|^2 \right\}$

Regularization is adaptive: $n_t = 1/(\sqrt{T} R_t^2)$ , with $R_t = \max_{s \leq t} \|x_s\|$ .

Sequential steps (summarized):

Step	Description	Complexity
Prediction update	Ridge regression by arm	$O(d^3)$ per step
Probability update	1D convex minimization in $u$	$O(\log(1/\epsilon))$
Residuals	Estimate armwise via IPW sums	$O(T d)$ (overall)

No hyperparameter tuning is required; all regularization is data-adaptive (Chen et al., 25 Nov 2025).

5. Theoretical Guarantees

Convergence of Neyman Regret

Sigmoid-FTRL achieves

$\operatorname{Reg}_{\text{Neyman}} = O(T^{-1/2} R^2)$

assuming bounded moments and well-conditioned Gram matrices after initial $\sqrt{T}$ samples, for any $\phi$ as above. This matches the lower bound:

$\operatorname{Reg}_{\text{Neyman}} \geq \Omega(T^{-1/2})$

where no algorithm can improve on the $T^{-1/2}$ rate under analogous regularity assumptions (demonstrated via a noisy two-armed construction) (Chen et al., 25 Nov 2025).

Distributional Asymptotics and Inference

Under non-superefficiency ( $\rho > -1$ ):

$\sqrt{T}(\hat{\tau} - \tau) \overset{d}{\to} \mathcal{N}(0, V^*)$

facilitating Wald-type inference.

Consistent Conservative Variance Estimator

A variance bound estimator,

$\hat{V}_B = \frac{4\sqrt{\hat{A}(1)\hat{A}(0)}}{T}$

with $\hat{A}(k)$ defined by armwise IPW residuals, is consistent:

$\mathbb{E}[\hat{V}_B] \to V_{\text{Bound}} = \frac{4\sigma_1 \sigma_0}{T}, \quad \text{Var}(\hat{V}_B) = O(T^{-5/6} R^{4/3})$

Both the variance estimator and the estimator $\hat{\tau}$ enable construction of asymptotically accurate $(1-\alpha)$ Wald-type confidence intervals,

$\mathrm{CI}_{1-\alpha} = [\hat{\tau} \pm z_{1-\alpha/2} \sqrt{\hat{V}_B}]$

with coverage tending to $1-\alpha$ as $T \to \infty$ (Chen et al., 25 Nov 2025).

6. Implementation Guidelines

Sigmoid choice: Recommended $\phi(u)$ includes either $\arctan(u)/\pi + 0.5$ or $u/(1+|u|)$ .
Adaptive regularization: Use $n_t = 1 / (\sqrt{T} R_t^2)$ , where $R_t$ tracks the maximum covariate norm to date; scaling with known $R$ is possible.
Complexity: Total algorithmic run time is $O(d^3 + Td)$ .
No additional tuning required: There are no separate step-size or clipping parameters beyond the inherent adaptivity and regularization.

A plausible implication is that Sigmoid-FTRL offers a turn-key approach for optimal assignment in design-based adaptive experiments using AIPW estimators.

7. Broader Context and Implications

Sigmoid-FTRL extends the literature connecting Neyman allocation and online convex optimization (OCO) beyond the Horvitz-Thompson estimator, addressing nonconvexity via convex decomposition and FTRL dynamics. It establishes sharp upper and lower regret bounds and supports practical confidence interval construction for deterministic potential outcomes, which is especially relevant for design-based inference in randomized controlled trials and sequential experimentation. The method’s adaptivity and lack of tuning requirements suggest applicability in practical online experiment pipelines without additional complexity (Chen et al., 25 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Sigmoid-FTRL: Design-Based Adaptive Neyman Allocation for AIPW Estimators (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Sigmoid-FTRL.