Local case-control sampling: Difference between revisions

Content deleted Content added
Created page with 'In machine learning, local case-control sampling is an algorithm used to reduce the complexity of training a logistic regression classifier. The algorithm re...'
 
fixed grammar/formatting, added a category
Line 1:
In [[machine learning]], '''local case-control sampling''' is an [[algorithm]] used to reduce the complexity of training a [[logistic regression]] classifier. The algorithm reduces the training complexity by selecting a small subsample of the original dataset for training. The algorithmIt assumes the availability of a (unreliable) pilot estimation of the parameters. It then performs a single pass over the entire dataset using the pilot estimation to identify the most "surprising" samples. In practice, the pilot may come from prior knowledge or training using a subsample of the dataset. The algorithm is most effective when the underlying dataset is imbalanced. It exploits the structures of conditional imbalanced datasets more efficiently than alternative methods, such as [[Logistic_regression#Case-control_sampling|case control sampling]] and weighted case control sampling.
 
 
<ref>{{cite journal|last1=Fithian|first1=William|last2=Hastie|first2=Trevor|title=Local case-control sampling: Efficient subsampling in imbalanced data sets|journal=The Annals of Statistics|date=2014|volume=42|issue=5|page=1693-1724|ref=http://arxiv.org/abs/1306.3706}}</ref>
 
[[Category:Machine learning]]