Arctan Pinball Loss in Quantile Regression
- Arctan pinball loss is a smooth, strictly convex surrogate for classical quantile loss, offering non-vanishing Hessians for effective second-order optimization in tree-based algorithms.
- It enables stable composite quantile regression by reducing quantile crossing and ensuring reliable curvature, thus enhancing prediction accuracy and interval coverage.
- Its integration into XGBoost facilitates joint multi-quantile prediction with efficient leaf updates and robust performance across diverse datasets.
The arctan pinball loss is a smooth loss function designed as a surrogate for the classical quantile pinball loss, specifically tailored to enable second-order optimization in tree-based algorithms such as XGBoost. Classical pinball loss, widely used for quantile regression, poses limitations for algorithms utilizing second-order Taylor approximations due to its piecewise linearity and zero Hessian almost everywhere. The arctan pinball loss offers a differentiable, strictly convex alternative with a non-vanishing and analytically tractable Hessian, supporting stable and efficient composite quantile regression that mitigates quantile crossing and aligns with high-performance, multi-quantile prediction within XGBoost’s framework (Sluijterman et al., 2024).
1. Mathematical Formulation
Let τ ∈ (0,1) denote the target quantile and u = y – ŷ the residual for a true label y and prediction ŷ. With a smoothing parameter s > 0, the arctan pinball loss is defined as
This function interpolates smoothly between the subgradients of the true quantile pinball loss and maintains positive-definite second derivatives for all u. For vector-valued quantile regression, τ may be vectorized and the loss computed per quantile.
The gradients and Hessians required for XGBoost’s second-order optimization are as follows:
- Gradient (with respect to u):
- Hessian (second derivative with respect to u):
Unlike the pinball loss and its Huber or exponential smoothings, the arctan pinball loss decays polynomially () at the tails, ensuring informative curvature even for large residuals.
2. Integration into XGBoost
The arctan pinball loss is fully compatible with XGBoost’s gradient-boosted tree procedure, which requires both first and second derivatives of the loss at each step:
- For each sample , compute .
- Derive and via the formulas above.
- XGBoost updates leaf weights by minimizing the second-order Taylor expansion of the loss. The optimal leaf value is given by:
where is the regularization parameter.
For composite (multi-quantile) regression, the loss is vectorized: the “num_output_group” parameter equals the number of quantiles to be fit. In this multi-output setting, each leaf stores a vector of weights, and the tree-growing and gain computations are vectorized accordingly. This enables simultaneous prediction of multiple quantiles in a single fitted model, with shared splits and parameters (Sluijterman et al., 2024).
3. Comparison with Other Losses
A comparative summary of the arctan pinball loss and related losses is as follows:
| Loss Function | Hessian Behavior | Suitability for XGBoost |
|---|---|---|
| Pinball (true) | Exactly zero almost everywhere | Not suitable (Hessian ≡ 0) |
| Huber-pinball (δ-smoothing) | Nonzero in | u |
| Exponential/logistic smoothing | Exponentially decaying at tails | Unstable second-order approx. for large |
| Arctan pinball | Positive everywhere, decays 0 | Robust curvature, ideal for XGBoost |
The key distinction is that the arctan pinball loss’s polynomial Hessian tails prevent second-order information from vanishing, a critical property not shared by exponential or Huber-based smoothings. This results in more reliable optimization across all residual scales, underpinning both split selection and leaf-weight updates throughout training.
4. Empirical Results and Quantile Crossing
Empirical evaluation demonstrates the practical advantages of arctan pinball loss in a range of regression settings:
- On a toy 1D problem (1, 2), “per-quantile” XGBoost produces approximately 20% quantile crossings between adjacent quantiles. In contrast, arctan-loss-based multi-quantile trees exhibit zero crossings and consistently aligned splits.
- Across UCI benchmarks (Boston, Energy, Concrete, Wine, Yacht, Kin8nm), with 10 quantiles simultaneously fitted:
- Quantile crossing rates are reduced from 11–30% (standard XGBoost) to 0.3–7% (arctan-loss XGBoost).
- Interval coverage improves, with marginal coverage for the 90% prediction interval moving closer to 90% in 5 out of 6 datasets.
- Average pinball losses and interval widths are sustained at comparable levels.
- On real-world electricity substation data:
- Quantile crossing prevalence drops from between 2–13% to less than 0.5% on all substations.
- Predictive accuracy (average pinball loss) and interval coverage remain similar, with maximum deviation in PI coverage within ±3% (Sluijterman et al., 2024).
These outcomes highlight the suitability of the arctan pinball loss for practical, robust quantile regression in machine learning ensembles.
5. Practical Recommendations
For effective deployment of arctan pinball loss within XGBoost, recommendations are as follows (Sluijterman et al., 2024):
- Standardization: Standardize the target 3 before model fitting; recommended 4 is robust across datasets.
- Smoothing parameter: Choose 5 in 6; smaller 7 approaches true pinball but shrinks Hessian, larger 8 increases bias and PI width.
- Tree parameters: Set XGBoost’s
min_child_weight=0since 9 variability is high; lower learning rate 0 to 0.05 and constrainmax_delta_step ≈ 0.5to stabilize updates where Hessian is small. - Multi-output configuration: Use
num_output_group=#quantilesfor simultaneous quantile prediction, enabling shared tree structure and split optimization. - Custom objective: Implement arctan loss and its derivatives in the
custom_objectivecallback, returning concatenated 1 and 2 as required. - Post-hoc calibration: Optional conformal calibration can correct any residual marginal miscoverage in the quantiles.
These practices allow for robust, efficient, and interpretable multi-quantile modeling within standard XGBoost pipelines.
6. Theoretical and Algorithmic Implications
The arctan pinball loss rectifies the principal deficiency of the classic pinball loss in second-order methods—its vanishing curvature—by providing a strictly positive, smoothly decaying Hessian. This enables:
- Stable tree-split search (due to persistent second-order gain across the full range of residuals).
- Integration of all quantiles in a joint (multi-output) tree, reducing computational burden and mitigating structural inconsistencies and quantile crossing.
- Preservation of sharp, unbiased interval estimates without sacrificing algorithmic tractability.
This suggests that the arctan pinball loss may become the default smooth surrogate for quantile regression within other second-order boosted approaches beyond XGBoost, provided Hessian-based optimization is central to model training (Sluijterman et al., 2024).