Revision as of 20:24, 12 June 2015 edit KSFT (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 6,192 edits fixed grammar/formatting, added a category ← Previous edit		Revision as of 20:25, 12 June 2015 edit undo KSFT (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 6,192 edits added maintenance tags Next edit →
Line 1: {{ Multiple issues\| {{Orphan}} {{Jargon}} }} In [[machine learning]], '''local case-control sampling''' is an [[algorithm]] used to reduce the complexity of training a [[logistic regression]] classifier. The algorithm reduces the training complexity by selecting a small subsample of the original dataset for training. It assumes the availability of a (unreliable) pilot estimation of the parameters. It then performs a single pass over the entire dataset using the pilot estimation to identify the most "surprising" samples. In practice, the pilot may come from prior knowledge or training using a subsample of the dataset. The algorithm is most effective when the underlying dataset is imbalanced. It exploits the structures of conditional imbalanced datasets more efficiently than alternative methods, such as [[Logistic_regression#Case-control_sampling\|case control sampling]] and weighted case control sampling.

Local case-control sampling: Difference between revisions