Content deleted Content added
Alter: pmc. Add: doi-access, s2cid, authors 1-1. Removed proxy/dead URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this tool. Report bugs. | #UCB_Gadget |
→Gini impurity: rephrase to remove repetitions |
||
Line 166:
===Gini impurity===
'''Gini impurity''', '''Gini's diversity index''',<ref>{{cite web |title=Growing Decision Trees |url=https://www.mathworks.com/help/stats/growing-decision-trees.html |website=MathWorks |publisher=MathWorks}}</ref> or [[Diversity index#Gini–Simpson index|Gini-Simpson Index]] in biodiversity research, is used by the CART (classification and regression tree) algorithm for classification trees, Gini impurity (named after Italian mathematician [[Corrado Gini]]) is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.
For a set of items with <math>J</math> classes and relative frequencies <math>p_i</math>, <math>i \in \{1, 2, ...,J\}</math>, the probability of choosing an item with label <math>i</math> is <math>p_i</math>, and the probability of miscategorizing that item is <math>\sum_{k \ne i} p_k = 1-p_i</math>. The Gini impurity is computed by summing pairwise products of these probabilities for each class label:
The Gini impurity is also an information theoretic measure and corresponds to [[Tsallis Entropy]] with deformation coefficient <math>q=2</math>, which in physics is associated with the lack of information in out-of-equilibrium, non-extensive, dissipative and quantum systems. For the limit <math>q\to 1</math> one recovers the usual Boltzmann-Gibbs or Shannon entropy. In this sense, the Gini impurity is nothing but a variation of the usual entropy measure for decision trees.▼
:<math>\operatorname{I}_G(p) = \sum_{i=1}^J \left( p_i \sum_{k\neq i} p_k \right)
Line 176 ⟶ 174:
= \sum_{i=1}^J (p_i - p_i^2)
= \sum_{i=1}^J p_i - \sum_{i=1}^J p_i^2
= 1 - \sum^J_{i=1} p_i^2. </math>
▲The Gini impurity is also an information theoretic measure and corresponds to [[Tsallis Entropy]] with deformation coefficient <math>q=2</math>, which in physics is associated with the lack of information in out-of-equilibrium, non-extensive, dissipative and quantum systems. For the limit <math>q\to 1</math> one recovers the usual Boltzmann-Gibbs or Shannon entropy. In this sense, the Gini impurity is nothing but a variation of the usual entropy measure for decision trees.
===Information gain===
|