Revision as of 05:44, 14 June 2015 edit 50.188.46.248 (talk) deleting the "too technical" tag; also some obvious notation corrections ← Previous edit		Revision as of 07:08, 14 June 2015 edit undo Gzluyongxi (talk \| contribs) 11 edits mNo edit summary Next edit →
Line 15: # The output model is <math> \hat{\theta} = (\hat{\alpha}, \hat{\beta}) </math>, where <math>\hat{\alpha} \leftarrow \hat{\alpha}_S + \tilde{\alpha} </math> and <math>\hat{\beta} \leftarrow \hat{\beta}_S + \tilde{\beta} </math>. The algorithm can be understood as selecting samples that surprises the pilot model. Intuitively these samples are closer to the [[Decision boundary\|decision boundary]] of the classifier and itis thus more informative. === Obtaining the pilot model === In practice, for cases where a pilot model is naturally available, the algorithm can be applied directly to reduce the complexity of training. In cases where a natural pilot is nonexistent, an estimate using a ~~estimates~~subsample ~~based~~selected onthrough ~~other~~another sampling technique can be ~~applied~~used instead. In the original paper describing the algorithm, the authors propose to use weighted case-control sampling with half the assigned sampling budget. For example, if the objective is to use a subsample with size <math> N=1000 </math>, first estimate a model <math>\tilde{\theta} </math> using <math> N_h = 500 </math> samples from weighted case control sampling, then collect another <math> N_h = 500 </math> samples using local case-control sampling. === Larger or smaller sample size ===

Local case-control sampling: Difference between revisions