Revision as of 22:48, 12 June 2015 edit Gzluyongxi (talk \| contribs) 11 edits No edit summary ← Previous edit		Revision as of 00:02, 13 June 2015 edit undo 76.167.136.94 (talk) No edit summary Next edit →
Line 5: }} In [[machine learning]], '''local case-control sampling''' <ref name="LCC">{{cite journal\|last1=Fithian\|first1=William\|last2=Hastie\|first2=Trevor\|title=Local case-control sampling: Efficient subsampling in imbalanced data sets\|journal=The Annals of Statistics\|date=2014\|volume=42\|issue=5\|page=1693-1724\|ref=http://arxiv.org/abs/1306.3706}}</ref> is an [[algorithm]] used to reduce the complexity of training a [[logistic regression]] classifier. The algorithm reduces the training complexity by selecting a small subsample of the original dataset for training. It assumes the availability of a (unreliable) pilot estimation of the parameters. It then performs a single pass over the entire dataset using the pilot estimation to identify the most "surprising" samples. In practice, the pilot may come from prior knowledge or training using a subsample of the dataset. The algorithm is most effective when the underlying dataset is imbalanced. It exploits the structures of conditional imbalanced datasets more efficiently than alternative methods, such as [[Logistic_regression#Case-control_sampling\|case control sampling]] and weighted case control sampling. == Imbalanced datasets == Line 32: The algorithm has the following properties. When the pilot is [[Consistency (statistics)\|consistent]], the estimates using the samples from local case-control sampling is consistent even under [[Specification (regression)\|model misspecification]]. If the model is correct then the algorithm has exactly twice the asymptotic variance of logistic regression on the full data set. For a larger sample size with <math> c>1 </math>, the factor 2 is improved to <math> 1+\frac{1}{c} </math>. == References == {{Reflist\|colwidth=30em\|refs= <ref>{{cite journal\|last1=Fithian\|first1=William\|last2=Hastie\|first2=Trevor\|title=Local case-control sampling: Efficient subsampling in imbalanced data sets\|journal=The Annals of Statistics\|date=2014\|volume=42\|issue=5\|page=1693-1724\|ref=http://arxiv.org/abs/1306.3706}}</ref> <ref name="LCC">{{cite journal\|last1=Fithian\|first1=William\|last2=Hastie\|first2=Trevor \| title=Local case-control sampling: Efficient subsampling in imbalanced data sets \| journal=The Annals of Statistics \| date=2014 \| volume=42 \| issue=5 \| page=1693-1724 \| ref=http://arxiv.org/abs/1306.3706}}</ref> }} [[Category:Machine learning]]

Local case-control sampling: Difference between revisions