Deep Distribution Regression (1903.06023v1)

Published 14 Mar 2019 in stat.ML, cs.LG, and stat.ME

Abstract: Due to their flexibility and predictive performance, machine-learning based regression methods have become an important tool for predictive modeling and forecasting. However, most methods focus on estimating the conditional mean or specific quantiles of the target quantity and do not provide the full conditional distribution, which contains uncertainty information that might be crucial for decision making. In this article, we provide a general solution by transforming a conditional distribution estimation problem into a constrained multi-class classification problem, in which tools such as deep neural networks. We propose a novel joint binary cross-entropy loss function to accomplish this goal. We demonstrate its performance in various simulation studies comparing to state-of-the-art competing methods. Additionally, our method shows improved accuracy in a probabilistic solar energy forecasting problem.

Citations (32)

View on Semantic Scholar

Summary

The paper develops a framework transforming conditional density estimation into a classification problem to capture the full conditional distribution of a response variable.
A novel Joint Binary Cross Entropy (JBCE) loss is introduced for estimating the conditional cumulative distribution function (CDF), leveraging ordinal structure and enforcing monotonicity.
Empirical results show the deep learning approach with JBCE loss achieves competitive performance, like a 5% CRPS reduction, and successfully applies to probabilistic solar energy forecasting.

The paper develops a framework that transforms conditional density estimation into a multi-class (or multiple binary) classification problem in order to capture the full conditional distribution of a response variable given covariates. The approach departs from conventional regression techniques that only target point estimates or specific quantiles, thereby providing richer uncertainty quantification essential for decision making in applications such as energy forecasting.

The methodology is organized into several components:

Conditional Density Approximation via Partitioning:

The response space $[l,u]$ is partitioned into $m+1$ bins via cut-points, and the conditional density $f(y|X)$ is approximated by a piecewise constant function given by $f(y|X) \approx \sum_{i=1}^{m+1}\frac{p_i(X)}{|T_i|} I(y \in T_i),$ where $p_i(X) = P(Y \in T_i \mid X)$ and $|T_i| = c_i-c_{i-1}$ . This formulation allows any flexible multi-class classification method to be utilized for estimating the bin probabilities.

Probability Estimation Using Classification Techniques:

Two strategies for estimating $p_i(X)$ are proposed:

Multinomial Log-likelihood: A conventional multinomial logistic regression is employed by modeling the conditional indicator events $\bm{G}_n$ with a one-hot encoding. A deep neural network is used to generate the logits $z(X_n;)$ with a softmax activation ensuring that the estimates form a proper probability vector. However, this strategy does not incorporate the natural ordering of the bins, and simulation studies reveal that its performance is sensitive to the choice of the number of cut-points.
Joint Binary Cross Entropy (JBCE) Loss: A novel loss function is introduced by converting the problem into estimating the conditional cumulative distribution function (CDF). For each cut-point $c_j$ $c_{j}$ , the binary cross entropy is computed using $BCE(c_j) = -\sum_{n=1}^N \Bigl\{ I(Y_n \le c_j)\log\bigl[F(c_j;X_n)\bigr] + \bigl[1-I(Y_n \le c_j)\bigr]\log\bigl[1-F(c_j;X_n)\bigr] \Bigr\},$ and summing these yields the JBCE loss. This joint formulation not only utilizes the natural ordinal structure among bins but also automatically enforces the monotonicity constraint on the estimated CDF.
- Ensemble Random Partitioning for Smoothing and Robustness:

To alleviate the sensitivity to the choice and fixed locations of the cut-points, an ensemble approach is proposed. Multiple density estimators are trained on randomly generated partitions of the response space, and the final estimate is an average over all individual estimators. This strategy yields a smoother approximation of the conditional density as the number of ensemble members increases, albeit with a linearly increasing computational cost.

Theoretical Consistency:
- $Bias(\hat{\pi}_k(X)) = o(1/K)$ for each bin, and
- $Var(\hat{\pi}_k(X)) = o(1/K^2)$ for each bin,
- under standard smoothness conditions on the true density $f(y|X)$ (e.g., bounded first and second derivatives), then the histogram-type estimator converges to the true density. An extensive discussion, including a worked example with binary logistic regression under appropriate rate conditions such as $K \log (K)/n \to 0$ , supports these claims.
Empirical Evaluations and Numerical Results:

The methodology is evaluated in simulation studies under a variety of data-generating processes, including linear and non-linear models with normal, mixture, and skewed error distributions. Performance is assessed based on scoring rules such as the Continuous Ranked Probability Score (CRPS) and Average Quantile Loss (AQTL). Notably, the deep neural network model trained with the JBCE loss outperforms both a logistic regression baseline and a quantile regression forest (QRF) approach, demonstrating about a 5% reduction in CRPS and AQTL relative to QRF. Furthermore, performance improvements are stable as the number of partitions increases when using the JBCE loss, highlighting reduced sensitivity to hyperparameter tuning.

Application to Probabilistic Solar Energy Forecasting:

The approach is applied to solar energy forecasting data from a recent Global Energy Forecasting Competition. The data comprises solar output measurements along with multiple weather forecast variables. The method is adapted to include additional features such as indicator variables and cyclical transformations (sine and cosine of the day of the year) to capture the seasonal variation. The proposed JBCE ensemble model not only produces competitive probabilistic forecasts, as evidenced by the superior scoring metrics, but also yields interpretable conditional densities. For instance, on sunny days the predicted density is concentrated with slight negative skewness due to capacity limits, whereas on days with variable cloud cover the density exhibits wider spreads. Detailed plots of the estimated conditional densities illustrate these varying patterns across different weather regimes.

Concluding Remarks:

The paper substantiates that by leveraging state-of-the-art neural network architectures and a novel joint binary cross entropy loss, it is possible to obtain fully probabilistic forecasts that are both accurate and robust. This framework is broadly applicable to various regression and forecasting tasks where capturing the uncertainty in predictions is as critical as the point estimates. The theoretical guarantees and empirical evidence presented provide strong support for the use of classification-based methods in conditional density estimation tasks, especially in high-dimensional and complex settings.

Collectively, the paper contributes a flexible, model-agnostic framework that bridges machine learning classification methods with probabilistic forecasting through innovative loss imposition, ensemble smoothing, and rigorous theoretical underpinnings.

PDF Markdown

Deep Distribution Regression (1903.06023v1)

Summary

Related Papers