Dice Question Streamline Icon: https://streamlinehq.com

Optimal dependence on k in Gaussian approximation rates for k-PNN random forests

Determine the optimal dependence on the terminal-node parameter k in the multivariate Gaussian approximation bounds for the k-potential nearest neighbor (k-PNN) random-forest estimator under Poisson sampling. Specifically, either prove that the current k^τ dependence in the error bounds is unavoidable by establishing matching lower bounds, or develop methods that improve the k-dependence beyond what is achieved using region-based stabilization and Stein's method.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper’s bounds for multivariate Gaussian approximation of k-PNN-based random forest estimators contain a factor kτ. This arises from Poisson tail bounds controlling the region of stabilization and reflects growing dependence among scores as k increases.

The authors show that, using their current technique (region-based stabilization with Stein's method), the kτ term cannot be improved, and they also provide lower bounds for related quantities suggesting tightness in a specific sense. However, whether this k-dependence is truly optimal for the normal approximation error—or can be improved by different techniques—remains unresolved.

References

"Hence, the $k\tau$ term cannot be further improved using the current proof technique (i.e., using region-based stabilization and Stein's method). Resolving this question of optimal $k$ dependency, either by demonstrating that the order of $k$ is necessary or by improving the $k$ dependency is thus an important open question."

Multivariate Gaussian Approximation for Random Forest via Region-based Stabilization (2403.09960 - Shi et al., 15 Mar 2024) in Remark [Dependence on k], Section 3 (Main results)