Conformalized High-Density Quantile Regression via Dynamic Prototypes-based Probability Density Estimation

Published 2 Nov 2024 in cs.LG and stat.ML | (2411.01266v1)

Abstract: Recent methods in quantile regression have adopted a classification perspective to handle challenges posed by heteroscedastic, multimodal, or skewed data by quantizing outputs into fixed bins. Although these regression-as-classification frameworks can capture high-density prediction regions and bypass convex quantile constraints, they are restricted by quantization errors and the curse of dimensionality due to a constant number of bins per dimension. To address these limitations, we introduce a conformalized high-density quantile regression approach with a dynamically adaptive set of prototypes. Our method optimizes the set of prototypes by adaptively adding, deleting, and relocating quantization bins throughout the training process. Moreover, our conformal scheme provides valid coverage guarantees, focusing on regions with the highest probability density. Experiments across diverse datasets and dimensionalities confirm that our method consistently achieves high-quality prediction regions with enhanced coverage and robustness, all while utilizing fewer prototypes and memory, ensuring scalability to higher dimensions. The code is available at https://github.com/batuceng/max_quantile .

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces CHDQR, which couples regression-as-classification with adaptive, dynamic prototypes to improve prediction interval accuracy.
It significantly reduces the exponential scaling of prototypes by locally refining density estimates in non-convex, high-density regions.
Empirical results demonstrate CHDQR's capability to produce tight, robust coverage in both low- and high-dimensional, multimodal tasks with minimal computational overhead.

Conformalized High-Density Quantile Regression with Dynamic Prototype-based Probability Density Estimation

Motivation for Adaptive High-Density Quantile Regression

Quantile regression and conformalized quantile regression (CQR) are the standard methodologies for constructing prediction intervals and sets with rigorous coverage guarantees. However, convexity constraints and inflexibility in capturing highly non-convex, multimodal, or high-dimensional data distributions severely limit their effectiveness, especially in safety-critical and complex ML applications. Recent work reframing regression as classification, such as the Regression-as-Classification (RasC) approach, offers improved adaptability to non-convex structures but is inherently restricted by the curse of dimensionality and dependence on static quantization bins. Fixed-binning schemes are inadequate for dense or intricately structured data.

The paper introduces Conformalized High-Density Quantile Regression (CHDQR), a rigorous framework that extends RasC by coupling it with an adaptive, dynamically evolving set of prototypes. This dynamism allows for effective spatial partitioning and density modeling, particularly in non-convex high-density regions, and is integrated with a conformalization pipeline for valid finite-sample coverage. The proposed scheme directly addresses the principal limitations of quantile regression, RasC, and related density estimation approaches.

Figure 1: Comparison of (a) conventional quantile regression (broad intervals), (b) static bin RasC, and (c) CHDQR with adaptive prototype precision in high-density regions.

Methodological Foundation

The core construct of CHDQR leverages a regression-as-classification paradigm, where the output space is partitioned by a set of prototypes representing Voronoi regions. Model outputs, $\hat{P}[y \in R_i \mid x]$ , are defined in terms of prototype region volume and local density, estimated via a log-density neural predictor. The probability for each region is proportional to its density and area, ensuring normalization via a softmax structure. Unlike prior approaches, CHDQR emphasizes continuous refinement of both the number and positions of prototypes, using a composite objective comprising cross-entropy, quantization error, and a prototype repulsion term to avoid degeneracy.

The prototype set adapts through the training process with two principal mechanisms: prototypes with low assignment frequency are pruned, whereas those in high-density regions are split via noise-perturbed duplication. This dynamic adjustment produces sharp, spatially efficient covering sets, reducing quantization error and memory requirements for high-dimensional targets.

Conformalization and High-Density Region Construction

CHDQR's conformalization phase employs the nonconformity score $S(y_{cal}, x_{cal})$ , which accumulates the predicted probabilities for high-density regions covering each calibration instance. Coverage calibration is then performed via the empirical quantile of these scores. At prediction time, the test point's associated density estimates and cumulative region probabilities determine the smallest, densest union of regions satisfying the $(1-\alpha)$ coverage threshold. This construction aligns with high-density region estimation theory and ensures finite-sample marginal validity under standard exchangeability assumptions.

Empirical Results in Low and High Dimensions

Extensive evaluation across synthetic and real-world datasets demonstrates several key findings:

CHDQR and CHDQR-Dynamic consistently achieve the stipulated coverage rates (e.g., 0.90, 0.50, 0.10) with significantly reduced normalized prediction region width (PINAW), especially in complex, multimodal, or non-convex settings.
In one-dimensional and toy multimodal tasks, CHDQR-Dynamic produces much tighter prediction regions by adaptively refining the set of active prototypes and focusing quantization in dense regions.
On higher-dimensional tasks (e.g., 2D Gaussian mixtures), fixed-bin RasC and CQR become computationally infeasible ( $O(n^d)$ bins required), whereas CHDQR-Dynamic matches or exceeds their coverage and region efficiency with orders of magnitude fewer prototypes.
The method displays strong robustness to outliers—adding synthetic outliers causes only minor increases in prediction set size, owing to CHDQR’s capacity for local adaptation.
Figure 2: Visualization of CHDQR-Dynamic quantile coverage on the Uncond1d dataset, highlighting compact region prediction for dense modes.

Figure 3: 50\% coverage visualization on Unconditional1d, showing precise selection of highest-density regions.

Figure 4: 10\% coverage visualization, with extremely narrow region assignment focusing strictly on modes.

Figure 5: Visualization of 90\% coverage level quantile regions; CHDQR-Dynamic creates minimal-width intervals covering modes with less redundancy than static approaches.

Figure 6: CHDQR-Dynamic applied on Uncond2d with 0.6\% outliers and 90\% coverage, illustrating robust adaptation and efficient region partitioning, including outlier incorporation.

Theoretical and Practical Implications

CHDQR represents a substantial development in the rigorous estimation of prediction sets:

It breaks the exponential scaling in prototype count customary to fixed-bin methods, making high-dimensional conformal quantile regression computationally feasible.
By dynamically allocating quantization complexity to dense data regions, it achieves coverage with smaller and sparser prototypes, optimizing both accuracy and memory footprint.
The framework generalizes to multi-target, high-dimensional regression, circumventing the restrictiveness of convexity and grid-based binning.

From a practical standpoint, CHDQR’s ability to construct sharply delimited, data-adaptive prediction regions—coupled with provable coverage guarantees—has significant implications for decision-support systems in safety-critical domains, anomaly detection, and resource allocation under uncertainty. Its design allows flexibility for integration with modern deep learning architectures as the log-density estimator, opening further applicability to structured and unstructured data.

Limitations and Directions for Future Work

While CHDQR addresses key scalability and coverage challenges, several open problems remain:

Voronoi tessellation and exact volume calculations currently restrict practical usage to $\leq3$ dimensions; relaxing the reliance on explicit area computation (e.g., via soft estimation) would permit direct extension to arbitrarily high dimensions.
The calibration-prediction gap and local conditional coverage (beyond marginal) are not explicitly addressed.
Sensitivity to hyperparameter choices for prototype adaptation schedules and loss weights could be mitigated with more automated or data-driven strategies.
A generalization to structured outputs (e.g., sequences, images) will require tailored prototype geometry/metric selection.

Conclusion

CHDQR advances conformal quantile regression by fusing dynamic prototype allocation with density-based region estimation, enabling accurate, scalable, and robust construction of high-density prediction sets in diverse regression settings. Its empirical performance across coverage regimes, dimensions, and under outlier perturbation, coupled with substantial computational advantages, underscores its utility as a foundation for rigorous uncertainty quantification on complex data. Future extensions in high-dimensional region estimation and conditional guarantees are poised to enhance its theoretical and applicative scope.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (3)

Collections

GitHub

GitHub - batuceng/max_quantile (2 stars)

Conformalized High-Density Quantile Regression via Dynamic Prototypes-based Probability Density Estimation

Summary

Conformalized High-Density Quantile Regression with Dynamic Prototype-based Probability Density Estimation

Motivation for Adaptive High-Density Quantile Regression

Methodological Foundation

Conformalization and High-Density Region Construction

Empirical Results in Low and High Dimensions

Theoretical and Practical Implications

Limitations and Directions for Future Work

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

GitHub