Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 87 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 29 tok/s Pro

GPT-4o 105 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 461 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

DART: Dropouts meet Multiple Additive Regression Trees (1505.01866v1)

Published 7 May 2015 in cs.LG and stat.ML

Abstract: Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin. We also show that DART overcomes the issue of over-specialization to a considerable extent.

Citations (182)

View on Semantic Scholar

Collections

Summary

A Critical Examination of Enhancing MART with Dropouts: The Introduction of DART

This paper presents an innovative approach to improving Multiple Additive Regression Trees (MART) through the application of dropout techniques, commonly used in deep neural networks, leading to the development of the DART algorithm. MART, an ensemble of boosted regression trees, is efficacious across various prediction tasks. However, it suffers from \overspecialization, where trees added in later iterations disproportionately affect a few data points, thereby compromising the overall generalization performance. The traditional method employed to mitigate this issue, shrinkage, is only partially successful.

Introduction of DART

The proposed DART algorithm integrates dropout strategies within the MART framework, addressing \overspecialization by muting entire trees instead of individual features. This approach allows the algorithm to balance the primary goal of MART, which is to improve prediction accuracy by iterative learning, with enhanced model robustness. The authors postulate that by randomly excluding entire trees, rather than individual connections or features, during learning iterations, DART achieves improved prediction accuracy and resilience to overfitting.

Empirical Evaluation

The empirical investigations across ranking, regression, and classification tasks substantiate the efficacy of DART. For ranking tasks, DART demonstrated superior performance compared to MART on the MSLR-WEB10K dataset, achieving a statistically significant increase in NDCG scores. This demonstrates the algorithm's potential utility in web-ranking applications, where traditional MART had been highly successful.

For regression tasks using the CT slices dataset, DART consistently improved over MART and Random Forests (RF) across varying ensemble sizes, as measured by L2 error. Importantly, the paper highlights that DART maintains more evenly distributed contributions from trees in the ensemble, a notable improvement over MART’s diminishing contributions from later trees.

In classification tasks, DART was evaluated on the face-detection dataset, where it performed comparably to MART, even slightly surpassing it in some configurations, while RF lagged in performance. The results underscore the advantage of dropouts in reducing model sensitivity to initial tree decisions, ultimately promoting an even contribution across the ensemble.

Implications and Future Directions

The introduction of DART marks a significant step in addressing \overspecialization in ensemble learning, thereby enhancing model accuracy and robustness. The findings support further exploration into applying dropouts across other ensemble algorithms like AdaBoost, potentially offering insights into improving model diversification without sacrificing individual learner efficacy.

Additionally, the controlled diversity introduced by DART could prove beneficial for learning tasks with dynamic or evolving data targets, suggesting applicability in real-time updates or drift-sensitive environments. Methodological improvements, such as tailoring dropout rates or exploring alternate normalization techniques, could further optimize DART's performance.

In summary, the paper provides a thorough and convincing argument for adopting dropouts within the MART framework, demonstrating through robust empirical evidence that DART addresses critical weaknesses in traditional boosting techniques. As the landscape of AI and machine learning continues to evolve, DART positions itself as a promising candidate for robust, efficient ensemble learning.