Papers
Topics
Authors
Recent
Search
2000 character limit reached

Balanced Gain Ratio (BGR) in Decision Trees

Updated 20 March 2026
  • Balanced Gain Ratio (BGR) is an impurity-based gain metric that adjusts the traditional gain function to correct bias in decision tree splits.
  • It computes the metric as the information gain divided by (1 + split information), preserving beneficial correction for high-arity splits while preventing bias from unbalanced, low-predictive splits.
  • Empirical evaluations demonstrate that BGR produces shallower, more interpretable trees and improves model induction speed and accuracy across various benchmark datasets.

The Balanced Gain Ratio (BGR) is an alternative impurity-based gain function for decision tree induction that addresses split selection biases observed in earlier metrics such as information gain and Quinlan's information gain ratio in C4.5. BGR is specifically designed to preserve the corrective properties of the gain ratio regarding high-cardinality categorical splits, while simultaneously mitigating the tendency of the gain ratio to favor extremely unbalanced, low-predictive-value splits. Its implementation yields more balanced decision trees—both in terms of depth and branch distribution—without reintroducing bias towards features with numerous distinct values (Leroux et al., 2018).

1. Formal Definition and Motivation

Let CC denote a training sample set of size nn with KK classes. A candidate split SS divides CC into JJ child nodes C1,,CJC^1, \ldots, C^J. For each candidate split, the impurity reduction gain G(S,C)G(S, C) (typically based on entropy) and the split information

SplitInformation(S,C)=j=1Jpjlogpj,\text{SplitInformation}(S, C) = -\sum_{j=1}^J p^j \log p^j,

with pj=Cj/np^j = |C^j|/n, are computed. Quinlan's traditional gain ratio is given by

GainRatio(S,C)=G(S,C)SplitInformation(S,C).\text{GainRatio}(S,C) = \frac{G(S,C)}{\text{SplitInformation}(S,C)}.

The Balanced Gain Ratio is formulated as

Γ(S,C)=G(S,C)1+SplitInformation(S,C).\Gamma(S,C) = \frac{G(S,C)}{1 + \text{SplitInformation}(S,C)}.

The addition of $1$ to the denominator is designed to eliminate the artificial inflation of the score for splits where split information is near zero (characteristic of highly unbalanced splits). This adjustment maintains the bias correction for high-arity (many-way) splits but prevents favoring splits that isolate tiny, pure subsets—a known issue with Quinlan's original ratio.

2. Mathematical Derivation

For a node CjC^j, the impurity using entropy is

I(Cj)=k=1Kpkjlogpkj,I(C^j) = -\sum_{k=1}^K p_k^j \log p_k^j,

where pkjp_k^j is the empirical frequency of class kk in CjC^j. The standard information gain is

G(S,C)=I(C)j=1JCjnI(Cj).G(S, C) = I(C) - \sum_{j=1}^J \frac{|C^j|}{n} I(C^j).

The split information is

SI(S,C)=j=1Jpjlogpj.SI(S, C) = -\sum_{j=1}^J p^j \log p^j.

Thus, the Balanced Gain Ratio is

Γ(S,C)=G(S,C)1+SI(S,C).\Gamma(S,C) = \frac{G(S,C)}{1 + SI(S,C)}.

If SISI is small (highly uneven splits), the denominator approaches $1$, so the ratio closely reflects the raw information gain. For large SISI (many-way splits), the behavior is similar to the original gain ratio, strongly penalizing high-cardinality splits.

3. Bias Corrections in Gain Functions

Standard Information Gain

The standard impurity reduction G(S,C)G(S,C) has a known bias toward splits on high-cardinality features because maximizing purity is trivial when each instance can be isolated. In the limit, features with one unique value per instance yield maximal gain, even absent predictive value.

Quinlan’s Gain Ratio

Quinlan’s ratio penalizes high-cardinality splits:

GainRatio(S,C)=G(S,C)SI(S,C)\text{GainRatio}(S,C) = \frac{G(S,C)}{SI(S,C)}

Here, SISI grows with the number of partitions, discouraging many-way splits. However, when SISI is very small—characteristic of splits producing one large and several tiny partitions—the denominator deflates, causing the gain ratio to be artificially large and biasing tree growth toward these unbalanced splits.

Balanced Gain Ratio

By redefining the denominator as $1 + SI$, BGR suppresses this bias. For large SISI, the additive $1$ is negligible and BGR approximates the original gain ratio. For SI0SI \rightarrow 0, BGR equals GG, precluding artifactual boosting of splits that simply isolate noise points.

4. Algorithmic Implementation

BGR integrates into top-down, greedy induction schemes such as C4.5 by substituting the gain function for split selection. The following pseudocode outlines the integration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function BuildTree(nodeSamples C):
  if stoppingCriterion(C) then
    return Leaf(label=majorityClass(C))
  bestScore  
  bestSplit   null
  for each feature f:
    for each candidate split S on f:
      Partition C into {C^1,,C^J} according to S
      Compute p^j = |C^j|/|C|, j=1J
      Compute impurity I(C) and I(C^j) for all j
      G  I(C)  Σ_{j=1}^J p^jI(C^j)                # standard gain
      SI  Σ_{j=1}^J p^jlog(p^j)                  # split information
      BGR  G / (1 + SI)                            # balanced gain ratio
      if BGR > bestScore:
        bestScore  BGR
        bestSplit  S
  if bestSplit is null then
    return Leaf(label=majorityClass(C))
  create internal node with test bestSplit
  for each child partition C^j of bestSplit:
    attach BuildTree(C^j) as j-th branch
  return internal node

All other induction, threshold discovery, and pruning strategies in conventional C4.5 are preserved (Leroux et al., 2018).

5. Theoretical Properties

  • Monotonicity in Split Information: The denominator $1 + SI$ is strictly increasing with SISI, ensuring that for constant gain, higher split information always reduces or preserves the BGR score.
  • Limiting Behavior:
    • As SISI \to \infty, ΓG/SI0\Gamma \sim G/SI \to 0, strongly discouraging high-arity splits.
    • As SI0SI \to 0, ΓG\Gamma \to G, removing any split information penalty and thus not artificially inflating scores for extreme unbalanced splits.
  • Effect of Partition Arity: For JJ branches, the maximal SISI is logJ\log J. Large JJ implies strong denominator growth and penalization.
  • Dependence on Class Distribution: BGR, through GG, remains sensitive to class mixtures in the partitions, but applies a tempered correction related to the balance of the split.

6. Empirical Evaluation

BGR and Quinlan’s gain ratio were evaluated on ten benchmark UCI datasets using C4.5-style induction, pessimistic error pruning, and stratified cross-validation (10 iterations of 5-folds). The key results on average accuracy are summarized below:

Dataset GainRatio (%) BGR (%) Δ = BGR–GainRatio (%)
Glass 64.02 65.42 +1.40
BUPA Liver 60.58 66.67 +6.09
Heart 78.15 77.78 −0.37
Balance Scale 73.92 77.60 +3.68
Survival 73.86 72.87 −0.99
PIMA 72.79 75.26 +2.47
Wine 94.38 94.38 +0.00
Bank Marketing 89.87 90.23 +0.36
Income 83.92 84.26 +0.34
Letter 86.85 87.40 +0.55

Notably, in large datasets (e.g., Letter), BGR produced trees with approximately 20 levels compared to ≈100 levels for the gain ratio, with a proportional speed-up in induction time. A plausible implication is that BGR's balanced bias leads to shallower, more interpretable models while maintaining or improving predictive accuracy.

7. Guidelines for Adoption

  • Integration: Deployment in C4.5-style systems requires only substituting SISI with $1+SI$ in the denominator of the gain function.
  • Applicability: BGR is especially advantageous in high-dimensional settings and with features possessing many categorical levels. In small or shallow-tree scenarios, performance closely matches that of the gain ratio (typically within 1%).
  • Parametric Stability: No additional hyperparameters are introduced; the +1+1 correction is robust across diverse datasets.
  • Extendibility: The $1+SI$ denominator adjustment generalizes to impurity-based gain metrics that use split information penalization, including those applied to multi-way numeric splits (Leroux et al., 2018).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Balanced Gain Ratio (BGR).