Active learning (machine learning): Difference between revisions

Content deleted Content added
Adding local short description: "Machine learning strategy", overriding Wikidata description "machine learning strategy in which a learning algorithm interactively queries for new labels"
Dyperd (talk | contribs)
added more strategies
Line 76:
*'''Exponentiated Gradient Exploration for Active Learning''':<ref name="Bouneffouf(2016)" /> In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration.
*'''Uncertainty sampling''': label those points for which the current model is least certain as to what the correct output should be.
*'''Entropy Sampling:''' The entropy formula is used on each sample, and the sample with the highest value will be selected as the following query.<ref>{{cite journal |last1=Faria |first1=Bruno |last2=Perdigão |first2=Dylan |last3=Brás |first3=Joana |last4=Macedo |first4=Luis |title=The Joint Role of Batch Size and Query Strategy in Active Learning-Based Prediction - A Case Study in the Heart Attack Domain |journal=Progress in Artificial Intelligence |date=2022 |pages=464–475 |doi=https://doi.org/10.1007/978-3-031-16474-3_38}}</ref>
*'''Random Sampling:''' This method implies that the following query is randomly selected.<ref>{{cite journal |last1=Faria |first1=Bruno |last2=Perdigão |first2=Dylan |last3=Brás |first3=Joana |last4=Macedo |first4=Luis |title=The Joint Role of Batch Size and Query Strategy in Active Learning-Based Prediction - A Case Study in the Heart Attack Domain |journal=Progress in Artificial Intelligence |date=2022 |pages=464–475 |doi=https://doi.org/10.1007/978-3-031-16474-3_38}}</ref>
*'''Margin Sampling:''' It considers the probability of the certainty of the output of a given sample subtracting the two highest probabilities, and selects the query whose result is the lowest.<ref>{{cite journal |last1=Faria |first1=Bruno |last2=Perdigão |first2=Dylan |last3=Brás |first3=Joana |last4=Macedo |first4=Luis |title=The Joint Role of Batch Size and Query Strategy in Active Learning-Based Prediction - A Case Study in the Heart Attack Domain |journal=Progress in Artificial Intelligence |date=2022 |pages=464–475 |doi=https://doi.org/10.1007/978-3-031-16474-3_38}}</ref>
*'''Least Confident Sampling:''' It also considers the probability of the certainty of the output of a given sample but only considers the highest probability.<ref>{{cite journal |last1=Faria |first1=Bruno |last2=Perdigão |first2=Dylan |last3=Brás |first3=Joana |last4=Macedo |first4=Luis |title=The Joint Role of Batch Size and Query Strategy in Active Learning-Based Prediction - A Case Study in the Heart Attack Domain |journal=Progress in Artificial Intelligence |date=2022 |pages=464–475 |doi=https://doi.org/10.1007/978-3-031-16474-3_38}}</ref>
*'''Query by committee''': a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most
*'''Querying from diverse subspaces or partitions''':<ref name="shubhomoydas_github"/> When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original [[feature (machine learning)|feature space]]. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling.