Papers
Topics
Authors
Recent
Search
2000 character limit reached

Arctan Pinball Loss in Quantile Regression

Updated 13 April 2026
  • Arctan pinball loss is a smooth, strictly convex surrogate for classical quantile loss, offering non-vanishing Hessians for effective second-order optimization in tree-based algorithms.
  • It enables stable composite quantile regression by reducing quantile crossing and ensuring reliable curvature, thus enhancing prediction accuracy and interval coverage.
  • Its integration into XGBoost facilitates joint multi-quantile prediction with efficient leaf updates and robust performance across diverse datasets.

The arctan pinball loss is a smooth loss function designed as a surrogate for the classical quantile pinball loss, specifically tailored to enable second-order optimization in tree-based algorithms such as XGBoost. Classical pinball loss, widely used for quantile regression, poses limitations for algorithms utilizing second-order Taylor approximations due to its piecewise linearity and zero Hessian almost everywhere. The arctan pinball loss offers a differentiable, strictly convex alternative with a non-vanishing and analytically tractable Hessian, supporting stable and efficient composite quantile regression that mitigates quantile crossing and aligns with high-performance, multi-quantile prediction within XGBoost’s framework (Sluijterman et al., 2024).

1. Mathematical Formulation

Let τ ∈ (0,1) denote the target quantile and u = y – ŷ the residual for a true label y and prediction ŷ. With a smoothing parameter s > 0, the arctan pinball loss is defined as

Lτ,s(arctan)(u)=(τ0.5+1πarctan ⁣(us))u+sπL^{(\mathrm{arctan})}_{\tau,s}(u) = \Bigl(\tau - 0.5 + \frac{1}{\pi}\arctan\!\left(\frac{u}{s}\right)\Bigr)\,u + \frac{s}{\pi}

This function interpolates smoothly between the subgradients of the true quantile pinball loss and maintains positive-definite second derivatives for all u. For vector-valued quantile regression, τ may be vectorized and the loss computed per quantile.

The gradients and Hessians required for XGBoost’s second-order optimization are as follows:

  • Gradient (with respect to u):

g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}

  • Hessian (second derivative with respect to u):

h(u)=2Lu2=2πs(1+(u/s)2)2h(u) = \frac{\partial^2 L}{\partial u^2} = \frac{2}{\pi s} (1 + (u/s)^2)^{-2}

Unlike the pinball loss and its Huber or exponential smoothings, the arctan pinball loss decays polynomially (O(u4)O(|u|^{-4})) at the tails, ensuring informative curvature even for large residuals.

2. Integration into XGBoost

The arctan pinball loss is fully compatible with XGBoost’s gradient-boosted tree procedure, which requires both first and second derivatives of the loss at each step:

  • For each sample ii, compute ui=yiy^iu_i = y_i - \hat{y}_i.
  • Derive gig_i and hih_i via the formulas above.
  • XGBoost updates leaf weights by minimizing the second-order Taylor expansion of the loss. The optimal leaf value is given by:

w=ileafgiileafhi+λw^* = - \frac{\sum_{i \in \text{leaf}} g_i}{\sum_{i \in \text{leaf}} h_i + \lambda}

where λ\lambda is the regularization parameter.

For composite (multi-quantile) regression, the loss is vectorized: the “num_output_group” parameter equals the number of quantiles to be fit. In this multi-output setting, each leaf stores a vector of weights, and the tree-growing and gain computations are vectorized accordingly. This enables simultaneous prediction of multiple quantiles in a single fitted model, with shared splits and parameters (Sluijterman et al., 2024).

3. Comparison with Other Losses

A comparative summary of the arctan pinball loss and related losses is as follows:

Loss Function Hessian Behavior Suitability for XGBoost
Pinball (true) Exactly zero almost everywhere Not suitable (Hessian ≡ 0)
Huber-pinball (δ-smoothing) Nonzero in u
Exponential/logistic smoothing Exponentially decaying at tails Unstable second-order approx. for large
Arctan pinball Positive everywhere, decays g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}0 Robust curvature, ideal for XGBoost

The key distinction is that the arctan pinball loss’s polynomial Hessian tails prevent second-order information from vanishing, a critical property not shared by exponential or Huber-based smoothings. This results in more reliable optimization across all residual scales, underpinning both split selection and leaf-weight updates throughout training.

4. Empirical Results and Quantile Crossing

Empirical evaluation demonstrates the practical advantages of arctan pinball loss in a range of regression settings:

  • On a toy 1D problem (g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}1, g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}2), “per-quantile” XGBoost produces approximately 20% quantile crossings between adjacent quantiles. In contrast, arctan-loss-based multi-quantile trees exhibit zero crossings and consistently aligned splits.
  • Across UCI benchmarks (Boston, Energy, Concrete, Wine, Yacht, Kin8nm), with 10 quantiles simultaneously fitted:
    • Quantile crossing rates are reduced from 11–30% (standard XGBoost) to 0.3–7% (arctan-loss XGBoost).
    • Interval coverage improves, with marginal coverage for the 90% prediction interval moving closer to 90% in 5 out of 6 datasets.
    • Average pinball losses and interval widths are sustained at comparable levels.
  • On real-world electricity substation data:
    • Quantile crossing prevalence drops from between 2–13% to less than 0.5% on all substations.
    • Predictive accuracy (average pinball loss) and interval coverage remain similar, with maximum deviation in PI coverage within ±3% (Sluijterman et al., 2024).

These outcomes highlight the suitability of the arctan pinball loss for practical, robust quantile regression in machine learning ensembles.

5. Practical Recommendations

For effective deployment of arctan pinball loss within XGBoost, recommendations are as follows (Sluijterman et al., 2024):

  1. Standardization: Standardize the target g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}3 before model fitting; recommended g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}4 is robust across datasets.
  2. Smoothing parameter: Choose g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}5 in g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}6; smaller g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}7 approaches true pinball but shrinks Hessian, larger g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}8 increases bias and PI width.
  3. Tree parameters: Set XGBoost’s min_child_weight=0 since g(u)=Lu=τ0.5+1πarctan(us)+uπs11+(u/s)2g(u) = \frac{\partial L}{\partial u} = \tau - 0.5 + \frac{1}{\pi} \arctan\left(\frac{u}{s}\right) + \frac{u}{\pi s} \frac{1}{1 + (u/s)^2}9 variability is high; lower learning rate h(u)=2Lu2=2πs(1+(u/s)2)2h(u) = \frac{\partial^2 L}{\partial u^2} = \frac{2}{\pi s} (1 + (u/s)^2)^{-2}0 to 0.05 and constrain max_delta_step ≈ 0.5 to stabilize updates where Hessian is small.
  4. Multi-output configuration: Use num_output_group=#quantiles for simultaneous quantile prediction, enabling shared tree structure and split optimization.
  5. Custom objective: Implement arctan loss and its derivatives in the custom_objective callback, returning concatenated h(u)=2Lu2=2πs(1+(u/s)2)2h(u) = \frac{\partial^2 L}{\partial u^2} = \frac{2}{\pi s} (1 + (u/s)^2)^{-2}1 and h(u)=2Lu2=2πs(1+(u/s)2)2h(u) = \frac{\partial^2 L}{\partial u^2} = \frac{2}{\pi s} (1 + (u/s)^2)^{-2}2 as required.
  6. Post-hoc calibration: Optional conformal calibration can correct any residual marginal miscoverage in the quantiles.

These practices allow for robust, efficient, and interpretable multi-quantile modeling within standard XGBoost pipelines.

6. Theoretical and Algorithmic Implications

The arctan pinball loss rectifies the principal deficiency of the classic pinball loss in second-order methods—its vanishing curvature—by providing a strictly positive, smoothly decaying Hessian. This enables:

  • Stable tree-split search (due to persistent second-order gain across the full range of residuals).
  • Integration of all quantiles in a joint (multi-output) tree, reducing computational burden and mitigating structural inconsistencies and quantile crossing.
  • Preservation of sharp, unbiased interval estimates without sacrificing algorithmic tractability.

This suggests that the arctan pinball loss may become the default smooth surrogate for quantile regression within other second-order boosted approaches beyond XGBoost, provided Hessian-based optimization is central to model training (Sluijterman et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Arctan Pinball Loss.