Deep Forest (1702.08835v4)

Published 28 Feb 2017 in cs.LG and stat.ML

Abstract: Current deep learning models are mostly build upon neural networks, i.e., multiple layers of parameterized differentiable nonlinear modules that can be trained by backpropagation. In this paper, we explore the possibility of building deep models based on non-differentiable modules. We conjecture that the mystery behind the success of deep neural networks owes much to three characteristics, i.e., layer-by-layer processing, in-model feature transformation and sufficient model complexity. We propose the gcForest approach, which generates \textit{deep forest} holding these characteristics. This is a decision tree ensemble approach, with much less hyper-parameters than deep neural networks, and its model complexity can be automatically determined in a data-dependent way. Experiments show that its performance is quite robust to hyper-parameter settings, such that in most cases, even across different data from different domains, it is able to get excellent performance by using the same default setting. This study opens the door of deep learning based on non-differentiable modules, and exhibits the possibility of constructing deep models without using backpropagation.

Authors (2)

Zhi-Hua Zhou (126 papers)
Ji Feng (72 papers)

Citations (958)

View on Semantic Scholar

Summary

The paper demonstrates that deep learning can be achieved using non-differentiable decision tree ensembles arranged in a cascade structure.
gcForest employs a layer-by-layer feature transformation with adaptive model complexity, reducing the need for extensive hyper-parameter tuning.
Robust experiments on datasets like MNIST and ORL show that gcForest delivers competitive performance compared to conventional deep neural networks.

An Essay on "Deep Forest"

The paper "Deep Forest" by Zhi-Hua Zhou and Ji Feng presents an innovative machine learning approach named gcForest (multi-Grained Cascade Forest) which investigates the potential of deep learning using non-differentiable modules. By leveraging ensemble methods, particularly decision tree ensembles, the authors propose a deep learning model that does not rely on gradient-based training such as backpropagation.

Key Characteristics and Motivations

The motivating premise of the paper is to address the question: "Can deep learning be realized with non-differentiable modules?" The authors conjecture that the efficacy of deep neural networks (DNNs) derives from three essential characteristics: layer-by-layer processing, in-model feature transformation, and sufficient model complexity. They argue that these properties can be replicated using a cascade structure of non-differentiable modules, specifically random forests and completely-random tree forests, thus circumventing the need for neural networks and backpropagation.

The gcForest Approach

gcForest constructs a cascade of decision tree ensembles, which allows for layer-by-layer processing. It incorporates two types of forests at each level of the cascade: random forests and completely-random tree forests. The decision outputs from the forests are used to augment the feature vector, and this augmented feature vector is passed to the next level in the cascade, enabling in-model feature transformation.

A significant advantage highlighted in the paper is that gcForest greatly reduces the number of hyper-parameters compared to DNNs. Additionally, its model complexity is determined adaptively based on cross-validation performance, which makes it robust even with small-scale training data. This adaptive complexity and fewer hyper-parameters make it an appealing alternative to DNNs, which often require extensive hyper-parameter tuning and large volumes of data.

Experimental Evaluation

The paper presents robust experimental results across a variety of tasks, including image categorization on MNIST, face recognition on ORL, music classification on GTZAN, hand movement recognition using sEMG data, and sentiment classification on IMDB. In most cases, gcForest achieves performance competitive with well-tuned DNNs.

For instance, on the MNIST dataset, gcForest achieves a test accuracy of 99.26%, outperforming a re-implementation of LeNet-5 and the Deep Belief Net. On the ORL face recognition dataset, gcForest exceeds the performance of Convolutional Neural Networks (CNNs) across different train/test splits. Remarkably, on the IMDB sentiment classification and GTZAN music classification tasks, gcForest demonstrates superior performance compared to MLPs and logistic regression.

Implications and Future Directions

The implications of this research are both practical and theoretical. From a practical standpoint, gcForest offers a viable alternative to DNNs, especially in scenarios where DNNs are not superior or where computational resources and data quantity are constrained. Its robustness to hyper-parameters and ability to automatically determine model complexity makes gcForest highly user-friendly and suitable for various applications without requiring extensive hyper-parameter tuning.

Theoretically, this paper broadens the scope of deep learning by illustrating that non-differentiable modules can be utilized effectively. This opens new avenues for research into other forms of non-NN style deep learning models that can exploit the vast array of non-differentiable learning modules developed within the machine learning community.

Speculations on Future Developments

Future developments could focus on enhancing the feature re-representation process of gcForest, as currently only simple class vectors are used. Exploring richer forms of feature augmentation could lead to even better performance, particularly on high-dimensional data. Leveraging new computational devices or distributed computing implementations to speed up training and reduce memory consumption would also be worthwhile, potentially enabling larger and more complex models of gcForest.

Another intriguing direction could involve integrating active learning and semi-supervised learning strategies. The nature of completely-random tree forests, which do not initially require labeled data, aligns well with these methods and could further expand the utility of gcForest in practical applications.

Conclusion

The "Deep Forest" paper by Zhou and Feng is a seminal paper that successfully demonstrates deep learning based on non-differentiable modules. By exploiting decision tree ensembles, they present gcForest, a powerful yet user-friendly alternative to neural network-based models. This paper not only contributes a novel methodology but also opens up new research directions in the field of deep learning, encouraging the exploration of ensemble methods and non-differentiable modules for building deep models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/107880534/status/1738295034791727279