Papers
Topics
Authors
Recent
2000 character limit reached

Sigmoid-FTRL: Adaptive ATE Estimation

Updated 26 November 2025
  • Sigmoid-FTRL is an adaptive online design strategy that minimizes Neyman regret for average treatment effect estimation using AIPW estimators.
  • It decomposes a nonconvex variance minimization problem into two convex learning tasks solved via FTRL updates over treatment probabilities and linear predictors.
  • The method achieves asymptotic optimality and enables valid inference through consistently conservative variance estimation and adaptive ridge regression.

Sigmoid-FTRL is an adaptive online experimental design strategy for minimizing variance (Neyman regret) in the estimation of average treatment effects using Augmented Inverse Probability Weighting (AIPW) estimators, explicitly within the design-based potential outcomes framework where both outcomes and covariates are deterministic. The method unifies online convex optimization and adaptive Neyman allocation via a decomposition of a nonconvex variance-minimization problem into two convex online learning problems, efficiently addressed through Follow-the-Regularized-Leader (FTRL) updates over both treatment probabilities and linear predictors. Sigmoid-FTRL establishes asymptotic optimality, supports consistently conservative variance estimation, and enables construction of valid confidence intervals under broad regularity conditions (Chen et al., 25 Nov 2025).

1. Design-Based Setting and Problem Formulation

Consider TT observed units indexed by t=1,,Tt = 1, \ldots, T, each with a covariate vector xtRdx_t \in \mathbb{R}^d bounded in norm (xtR\|x_t\| \leq R) and deterministic potential outcomes yt(1),yt(0)Ry_t(1), y_t(0) \in \mathbb{R}. The goal is estimation of the average treatment effect (ATE),

τ:=1Tt=1T[yt(1)yt(0)]\tau := \frac{1}{T}\sum_{t=1}^T[y_t(1) - y_t(0)]

using only randomized assignment. At each round tt, the procedure selects:

  • Assignment probability pt(0,1)p_t \in (0,1) as a function of history Ft1\mathcal{F}_{t-1}
  • Linear predictor coefficients βt(1),βt(0)Rd\beta_t(1), \beta_t(0) \in \mathbb{R}^d

Treatment ZtBernoulli(pt)Z_t \sim \mathrm{Bernoulli}(p_t) is randomized, generating observed outcome Yt=Ztyt(1)+(1Zt)yt(0)Y_t = Z_t y_t(1) + (1-Z_t)y_t(0). For each arm k{0,1}k \in \{0,1\}, online ridge-regression is used to fit

βt(k)argminβs<t1[Zs=k](Ysxsβ)2P(Zs=k)+nt1β2\beta_t(k) \approx \arg\min_\beta \sum_{s<t} 1[Z_s = k]\frac{(Y_s - x_s^\top \beta)^2}{\mathbb{P}(Z_s = k)} + n_t^{-1}\|\beta\|^2

with ntn_t an adaptive regularization parameter.

The adaptive AIPW estimator is

τ^=1Tt=1T{xt[βt(1)βt(0)]+1[Zt=1](Ytxtβt(1))pt1[Zt=0](Ytxtβt(0))1pt}\hat{\tau} = \frac{1}{T} \sum_{t=1}^T \left\{ x_t^\top[\beta_t(1) - \beta_t(0)] + \frac{1[Z_t=1](Y_t - x_t^\top \beta_t(1))}{p_t} - \frac{1[Z_t=0](Y_t - x_t^\top \beta_t(0))}{1-p_t} \right\}

which is unbiased, and whose variance (and thus regret relative to the oracle design) admits closed-form analysis (Chen et al., 25 Nov 2025).

2. Neyman Regret and Oracle Design Benchmark

The “oracle” nonadaptive design fixes both linear predictors β(k)\beta^*(k) (by armwise OLS on all TT units) and probability pp^*, minimizing the expected variance:

p=(1+σ0σ1)1p^* = \left(1 + \frac{\sigma_0}{\sigma_1}\right)^{-1}

where σk2\sigma_k^2 is the residual variance for potential outcomes under arm kk.

The benchmark variance is

V=2(1+ρ)σ1σ0T,ρ=corr(residuals)V^* = \frac{2(1 + \rho)\sigma_1\sigma_0}{T}, \quad \rho = \text{corr}(\text{residuals})

The Neyman regret of any adaptive policy Π\Pi is

RegNeyman(Π)=TVarΠ(τ^)TV\operatorname{Reg}_{\text{Neyman}}(\Pi) = T\,\operatorname{Var}_\Pi(\hat{\tau}) - T\,V^*

highlighting the additional variance incurred by adaptation relative to the nonadaptive oracle (Chen et al., 25 Nov 2025).

3. Algorithmic Formulation: Decomposition and Convexification

Direct minimization of Neyman regret is nonconvex in the triple (p,β(1),β(0))(p, \beta(1), \beta(0)). Sigmoid-FTRL circumvents this by decomposing the regret into two convex sequences:

  • Probability Regret: For fixed predictors,

ft(p)=(yt(1)xtβt(1))2(1p)+(yt(0)xtβt(0))2pp(1p)f_t(p) = \frac{(y_t(1)-x_t^\top\beta_t(1))^2(1-p) + (y_t(0)-x_t^\top\beta_t(0))^2p}{p(1-p)}

and

Rprob=E[t=1T(ft(pt)ft(p))]R_{\mathrm{prob}} = \mathbb{E}\left[\sum_{t=1}^T (f_t(p_t) - f_t(p^*))\right]

with pft(p)p \mapsto f_t(p) convex on (0,1)(0,1).

  • Prediction Regret: For fixed pp,

t(β(1),β(0))=(yt(1)xtβ(1))2σ1+(yt(0)xtβ(0))2σ0\ell_t(\beta(1), \beta(0)) = \frac{(y_t(1)-x_t^\top\beta(1))^2}{\sigma_1} + \frac{(y_t(0)-x_t^\top\beta(0))^2}{\sigma_0}

and

Rpred=E[t=1T(t(βt(1),βt(0))t(β(1),β(0)))]R_{\mathrm{pred}} = \mathbb{E}\left[\sum_{t=1}^T (\ell_t(\beta_t(1), \beta_t(0)) - \ell_t(\beta^*(1), \beta^*(0)))\right]

which is jointly convex in (β(1),β(0))(\beta(1), \beta(0)).

Lemma 3.3 asserts that

RegN=Rprob+Rpred\operatorname{Reg}_N = R_{\mathrm{prob}} + R_{\mathrm{pred}}

enabling separate convex-optimizable updates (Chen et al., 25 Nov 2025).

4. Sigmoid-FTRL Mechanism

The algorithm maintains parameter utRu_t \in \mathbb{R} such that pt=ϕ(ut)p_t = \phi(u_t) for a differentiable sigmoid ϕ:R(0,1)\phi : \mathbb{R} \rightarrow (0,1) with properties: monotonicity, ϕ(u)+ϕ(u)=1\phi(-u) + \phi(u) = 1, as well as specific convexity and derivative decay conditions. Examples include ϕ(u)=12+arctan(u)/π\phi(u) = \frac{1}{2} + \arctan(u)/\pi or ϕ(u)=u/(1+u)\phi(u) = u/(1+|u|).

For probability updates:

  • Define ht(u):=ft(ϕ(u))h_t(u) := f_t(\phi(u)).
  • Use FTRL with regularizer r(u)=u2+u3r(u) = u^2 + |u|^3:

ut=argminuR{s<th^s(u)+ntr(u)}u_t = \arg\min_{u \in \mathbb{R}} \left\{\sum_{s<t} \hat{h}_s(u) + n_t r(u)\right\}

where h^s(u)\hat{h}_s(u) is an IPW-estimator of hs(u)h_s(u).

For linear predictor updates:

  • For each arm, solve

βt(k)=argminβ{s<t1[Zs=k](Ysxsβ)2P(Zs=k)+nt1β2}\beta_t(k) = \arg\min_{\beta} \left\{ \sum_{s<t} 1[Z_s = k] \frac{(Y_s - x_s^\top \beta)^2}{\mathbb{P}(Z_s = k)} + n_t^{-1} \|\beta\|^2 \right\}

Regularization is adaptive: nt=1/(TRt2)n_t = 1/(\sqrt{T} R_t^2), with Rt=maxstxsR_t = \max_{s \leq t} \|x_s\|.

Sequential steps (summarized):

Step Description Complexity
Prediction update Ridge regression by arm O(d3)O(d^3) per step
Probability update 1D convex minimization in uu O(log(1/ϵ))O(\log(1/\epsilon))
Residuals Estimate armwise via IPW sums O(Td)O(T d) (overall)

No hyperparameter tuning is required; all regularization is data-adaptive (Chen et al., 25 Nov 2025).

5. Theoretical Guarantees

Convergence of Neyman Regret

Sigmoid-FTRL achieves

RegNeyman=O(T1/2R2)\operatorname{Reg}_{\text{Neyman}} = O(T^{-1/2} R^2)

assuming bounded moments and well-conditioned Gram matrices after initial T\sqrt{T} samples, for any ϕ\phi as above. This matches the lower bound:

RegNeymanΩ(T1/2)\operatorname{Reg}_{\text{Neyman}} \geq \Omega(T^{-1/2})

where no algorithm can improve on the T1/2T^{-1/2} rate under analogous regularity assumptions (demonstrated via a noisy two-armed construction) (Chen et al., 25 Nov 2025).

Distributional Asymptotics and Inference

Under non-superefficiency (ρ>1\rho > -1):

T(τ^τ)dN(0,V)\sqrt{T}(\hat{\tau} - \tau) \overset{d}{\to} \mathcal{N}(0, V^*)

facilitating Wald-type inference.

Consistent Conservative Variance Estimator

A variance bound estimator,

V^B=4A^(1)A^(0)T\hat{V}_B = \frac{4\sqrt{\hat{A}(1)\hat{A}(0)}}{T}

with A^(k)\hat{A}(k) defined by armwise IPW residuals, is consistent:

E[V^B]VBound=4σ1σ0T,Var(V^B)=O(T5/6R4/3)\mathbb{E}[\hat{V}_B] \to V_{\text{Bound}} = \frac{4\sigma_1 \sigma_0}{T}, \quad \text{Var}(\hat{V}_B) = O(T^{-5/6} R^{4/3})

Both the variance estimator and the estimator τ^\hat{\tau} enable construction of asymptotically accurate (1α)(1-\alpha) Wald-type confidence intervals,

CI1α=[τ^±z1α/2V^B]\mathrm{CI}_{1-\alpha} = [\hat{\tau} \pm z_{1-\alpha/2} \sqrt{\hat{V}_B}]

with coverage tending to 1α1-\alpha as TT \to \infty (Chen et al., 25 Nov 2025).

6. Implementation Guidelines

  • Sigmoid choice: Recommended ϕ(u)\phi(u) includes either arctan(u)/π+0.5\arctan(u)/\pi + 0.5 or u/(1+u)u/(1+|u|).
  • Adaptive regularization: Use nt=1/(TRt2)n_t = 1 / (\sqrt{T} R_t^2), where RtR_t tracks the maximum covariate norm to date; scaling with known RR is possible.
  • Complexity: Total algorithmic run time is O(d3+Td)O(d^3 + Td).
  • No additional tuning required: There are no separate step-size or clipping parameters beyond the inherent adaptivity and regularization.

A plausible implication is that Sigmoid-FTRL offers a turn-key approach for optimal assignment in design-based adaptive experiments using AIPW estimators.

7. Broader Context and Implications

Sigmoid-FTRL extends the literature connecting Neyman allocation and online convex optimization (OCO) beyond the Horvitz-Thompson estimator, addressing nonconvexity via convex decomposition and FTRL dynamics. It establishes sharp upper and lower regret bounds and supports practical confidence interval construction for deterministic potential outcomes, which is especially relevant for design-based inference in randomized controlled trials and sequential experimentation. The method’s adaptivity and lack of tuning requirements suggest applicability in practical online experiment pipelines without additional complexity (Chen et al., 25 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Sigmoid-FTRL.