HyperAI

Gain Ratio

Gain rateUsually refers to the information gain rate, which represents the ratio of the node information to the node split information measure. The gain rate is usually used as one of the attribute selection methods. The other two common methods are information gain and Gini index.

The gain rate formula is as follows:

Usually, the attribute with the largest gain rate is taken as the best splitting attribute. If there are too many values for a single attribute, SplitInfoR(D) will become larger, which will lead to a smaller GainRatio(R). However, the gain rate also has disadvantages. If SplitInfo(D) is 0, then there is no calculation meaning; and when SplitInfo(D) tends to 0, the GainRatio(R) value also becomes unreliable. The improvement measure is to add smoothing to the denominator. Here, an average value of all split information is added:

References

【1】Detailed explanation of information gain and information gain rate

【2】Data Mining Series - Decision Tree Classification Algorithm