Revision as of 07:08, 14 June 2015 edit Gzluyongxi (talk \| contribs) 11 edits mNo edit summary ← Previous edit		Revision as of 17:44, 14 June 2015 edit undo Michael Hardy (talk \| contribs) Administrators 210,577 edits →Algorithm outline: The whole point of the "exp" notation is to make it unnecessary to use a superscript in cases where it's typographically inconvenient. Next edit →
Line 9: == Algorithm outline == In logistic regression, given the model <math> \theta = (\alpha, \beta) </math>, the prediction is made according to <math> \mathbb{P}(Y=1\mid X; \theta) = \tilde{p}_{\theta}(x) = \frac{\exp^{(\alpha+\beta^T x})}{1+\exp^{(\alpha+\beta^T x})} </math>. The local-case control sampling algorithm assumes the availability of a pilot model <math>\tilde{\theta} = (\tilde{\alpha}, \tilde{\beta}) </math>. Given the pilot model, the algorithm performs a single pass over the entire dataset to select the subset of samples to include in training the logistic regression model. For a sample <math> (x,y) </math>, define the acceptance probability as <math> a(x,y) = \|y-\tilde{p}_{\tilde{\theta}}(x)\| </math>. The algorithm proceeds as follows: # Generate independent <math> z_i \sim \text{Bernoulli}(a(x_i,y_i)) </math> for <math> i \in \{1, \ldots, N\} </math>. Line 21: === Larger or smaller sample size === It is possible to control the sample size by multiplying the acceptance probability with a constant <math> c </math>. For a larger sample size, pick <math> c>1 </math> and adjust the acceptance probability to <math> \min(ca(x_i, y_i), 1) </math>. For a smaller sample size, the same strategy applies. In cases where the number of samples desired is precise, a convenient alternative method is to uniformly downsample from a larger subsample selected by local case-control sampling. == Properties ==

Local case-control sampling: Difference between revisions