Random Forest Classifier

Updated 27 October 2025

Random Forest Classifier is an ensemble-based algorithm that aggregates multiple decision trees, using bootstrap sampling and random feature selection for robust classification.
It minimizes variance and overfitting through decorrelated decision trees, ensuring scalable performance and improved generalization on diverse datasets.
Advanced techniques like RF-SOM use RF-derived dissimilarity to enhance visualization and accuracy, linking attribute combinations to clear decision boundaries.

A Random Forest Classifier is an ensemble-based supervised learning algorithm that aggregates the predictions of multiple decision trees to perform classification tasks. Each decision tree in the ensemble is trained on a random bootstrap sample of the data, and at each node split, a randomly selected subset of features is evaluated. The final prediction is determined by majority voting among all trees in the forest. The Random Forest paradigm is recognized for its robust generalization performance, scalability, and empirical resistance to overfitting, owing to the randomness in both data and feature selection for constructing each tree.

1. Fundamental Mechanics of the Random Forest Classifier

Each Random Forest (RF) consists of an ensemble of $T$ independently trained decision trees. For each tree, the training data is sampled with replacement (bootstrap sampling), and at every split, a random subset of $m$ features (typically $m = \sqrt{M}$ for $M$ total features) is considered for splitting. Let $x$ be a sample; the output of the RF is

$y = \operatorname{mode}(T_1(x), T_2(x), ..., T_T(x))$

where $T_i(x)$ denotes the prediction of the $i^{th}$ tree. The aggregation strategy decorrelates the trees, resulting in reduced variance relative to a single decision tree.

The RF further allows the computation of a “proximity matrix” $Prox(i, j)$ , where

$Dis(i,j) = 1 – Prox(i,j)$

and $Prox(i, j)$ quantifies the frequency with which samples $i$ and $j$ co-occur in the same leaf across trees, normalized by $T$ (the total number of trees).

2. Interpretation and Visualization Challenges

While individual trees in a Random Forest can be visualized and interpreted, the ensemble as a whole functions as a complex, high-variance, low-bias “black box” whose global decision boundaries defy straightforward interpretation. Classical approaches for visualizing the relationships learned by an RF, notably Multidimensional Scaling (MDS) of the proximity matrix, provide 2D embeddings by preserving pairwise dissimilarities; however, these embeddings suffer from $\mathcal{O}(N^2)$ memory requirements and provide only a distributional representation of samples, obscuring the explicit mapping from original features to visual clusters.

Limitations of MDS:

High memory complexity $\mathcal{O}(N^2)$ for $N$ samples.
Projections obscure the association between attribute combinations and regions on the map.
Point clouds in 2D often do not reveal crisp decision boundaries linked to specific features.

3. Self-Organising Maps for RF Visualization (RF-SOM)

The paper introduces Self-Organising Maps (SOM) as an alternative visualization and analysis method for Random Forests. Standard SOMs are neural models that project high-dimensional data into a 2D lattice, typically using Euclidean distance to identify the Best Matching Unit (BMU). The key innovation is to substitute the Euclidean distance with RF-derived dissimilarities, so BMU selection and map training are determined by the RF proximity matrix.

Salient features of the RF-SOM:

Memory complexity reduced to $\mathcal{O}(L^2)$ for $L$ neurons.
The mapping relates explicit attribute combinations to distinct neurons, addressing the key limitation of MDS.
The model supports classification: after SOM training, neurons receive class labels; test samples are projected and classified based on the winning neuron.

4. Detailed Algorithmic Description

For each training sample $x_j$ during an epoch:

Union Construction: Form a data set $H = W \cup \{x_j\}$ , where $W$ is the matrix of neuron weights.
Dissimilarity Computation: Compute the RF dissimilarity matrix $Dis_H$ over $H$ .
BMU Identification: Find neuron $v$ such that $v = \arg\min_{h: h \ne j} Dis_H(j, h)$ .
Neuron Update: Update all neurons using the standard SOM learning rule:

$W_{pq}(t+1) = W_{pq}(t) + \eta \cdot h^{(i)}_{pq} \cdot (x_j - W_{pq}(t))$

where $\eta$ is the learning coefficient and $h^{(i)}_{pq}$ is a decaying neighborhood function.

For test samples, the process is analogous: compute RF dissimilarities to all neurons, assign to the BMU, and use the neuron’s label for prediction.

5. Empirical Performance and Comparative Analysis

Evaluation on UCI datasets demonstrates that:

Both RF-MDS and RF-SOM (using the RF proximity matrix) produce robust embeddings and clusterings that reflect high-dimensional inter-class structure more effectively than traditional MDS or SOM with Euclidean distance.
RF-SOM achieves significant gains in classification accuracy relative to standard SOM—most notably, an increase exceeding 12% on the ‘Sonar’ dataset.
In scenarios where the native RF excels relative to SOM, these advantages propagate into corresponding increases in RF-SOM performance.
For well-separated datasets (e.g., ‘Wine’, ‘Iris’), both methods offer comparable results, though RF-SOM can have increased output variance.

Table: Accuracy Comparison (Excerpted)

Dataset	Standard SOM	RF-SOM	Max Improvement
Sonar	lower	+12%	Sonar (>12%)
Glass	lower	higher	(significant)
Ionosphere	lower	higher	(significant)
Pima	lower	higher	(significant)
Wine	≈	≈	negligible
Iris	≈	≈	negligible

Improvements are observed particularly on datasets with pronounced class overlap where the standard SOM is suboptimal, highlighting RF-induced dissimilarity as a key enabler of complex decision boundary approximation.

6. Implications and Applications

Embedding RF-derived dissimilarity into neural models offers two essential advantages beyond visualization:

Interpretability: RF-SOM maps provide direct insights into how attributes cluster or define decision surfaces in the data. Clusters on the map reflect the nuanced structure imposed by the RF, thus facilitating a better understanding of complex, nonlinear interclass relationships.
Performance: Using RF dissimilarity for training SOMs consistently improves classification accuracy, especially on noisy or complex datasets. RF proximities are less sensitive to outliers and monotonic data transformations, and their integration yields superior resilience compared to classical Euclidean metrics.
Scalability: The method preserves low-memory complexity relative to MDS, enabling scaling to larger datasets.

Applications include—but are not limited to—medical imaging (e.g., dementia or tumor classification), fraud detection, and broader exploratory data mining, particularly in domains where both classification accuracy and model transparency are required.

7. Conclusion

The integration of Random Forest proximity-based dissimilarity with Self-Organising Map learning (RF-SOM) provides a principled approach to both visualizing and enhancing classification decisions derived from RF ensembles. RF-SOM overcomes significant scalability and interpretability limitations of MDS, produces explicit mappings between data attributes and 2D representations, and achieves improved accuracy in classification tasks. This approach thus constitutes a robust methodological advance for practitioners requiring scalable, interpretable, and high-accuracy ensemble models for high-dimensional data exploration and classification (Płoński et al., 2014).

PDF Markdown Chat (Pro)

References (1)

Visualizing Random Forest with Self-Organising Map (2014)

Follow Topic

Get notified by email when new papers are published related to Random Forest Classifier.