Content deleted Content added
→Algorithm outline: The whole point of the "exp" notation is to make it unnecessary to use a superscript in cases where it's typographically inconvenient. |
m →top: parameter misuse; |
||
(8 intermediate revisions by 7 users not shown) | |||
Line 1:
In [[machine learning]], '''local case-control sampling''' <ref name="LCC">{{cite journal|last1=Fithian|first1=William|last2=Hastie|first2=Trevor|title=Local case-control sampling: Efficient subsampling in imbalanced data sets|journal=The Annals of Statistics|date=2014|volume=42|issue=5|
== Imbalanced datasets ==
Line 15:
# The output model is <math> \hat{\theta} = (\hat{\alpha}, \hat{\beta}) </math>, where <math>\hat{\alpha} \leftarrow \hat{\alpha}_S + \tilde{\alpha} </math> and <math>\hat{\beta} \leftarrow \hat{\beta}_S + \tilde{\beta} </math>.
The algorithm can be understood as selecting samples that surprises the pilot model. Intuitively these samples are closer to the [[
=== Obtaining the pilot model ===
In practice, for cases where a pilot model is naturally available, the algorithm can be applied directly to reduce the complexity of training. In cases where a natural pilot is nonexistent, an estimate using a subsample selected through another sampling technique can be used instead. In the original paper describing the algorithm, the authors propose to use weighted case-control sampling with half the assigned sampling budget. For example, if the objective is to use a subsample with size <math> N=1000 </math>, first estimate a model <math>\tilde{\theta} </math> using <math> N_h = 500 </math> samples from weighted case control sampling, then collect another <math> N_h = 500 </math> samples using local case-control sampling.
=== Larger or smaller sample size ===
Line 24:
== Properties ==
The algorithm has the following properties. When the pilot is [[Consistency (statistics)|consistent]], the estimates using the samples from local case-control sampling is consistent even under [[
== References ==
{{Reflist
[[Category:Machine learning]]
[[Category:
|