Revision as of 21:04, 12 June 2015 edit Gzluyongxi (talk \| contribs) 11 edits No edit summary ← Previous edit		Revision as of 21:24, 12 June 2015 edit undo Gzluyongxi (talk \| contribs) 11 edits No edit summary Next edit →
Line 13: * ''Marginal Imbalance''. A dataset is marginally imbalanced if one class is rare compared to the other class. In other words, <math> \mathbb{P}(Y=1) \approx 0 </math>. * ''Conditional Imbalance''. A dataset is conditionally imbalanced when it is easy to predict the correct labels in most cases. For example, if <math> X \in \{0,1\} </math>, the dataset is conditionally imbalanced if <math> \mathbb{P}(Y=1\|X=0) \approx 0 </math> and <math> \mathbb{P}(Y=1\|X=1) \approx 1 </math>. == Local case-control sampling == In logistic regression, given the model <math> M = (\alpha, \beta) </math>, the prediction is made according to <math> \mathbb{P}(Y=1\|X; M) = \tilde{p}_M(x) = \frac{\exp^{\alpha+\beta^T x}}{1+\exp^{\alpha+\beta^T x}} </math>. The local-case control sampling algorithm assumes the availability of a pilot model <math>\tilde{M} = (\tilde{\alpha}, \tilde{\beta}) </math>. Given the pilot model, the algorithm performs a single pass over the entire dataset to select the subset of samples to include in training the logistic regression model. For a sample <math> (x,y) </math>, define the acceptance probability as <math> a(x,y) = \|y-\tilde{p}_{\tilde{M}}(x)\| </math>. The algorithm proceeds as follows: # Generate independent <math> z_i \sim \text{Bernoulli}(a(x_i,y_i)) </math> for <math> i \in \{1, \cdots, N\} </math>. # Fit a logistic regression model to the subsample <math> S = \{(x_i, y_i) : z_i =1 \} </math>, obtaining the unadjusted estimates <math> \hat{\theta}_S = (\hat{\alpha}_S, \hat{\beta}_S) </math>. # The output model is <math> \hat{M} = (\hat{\alpha}, \hat{\beta}) </math>, where <math>\hat{\alpha} \leftarrow \hat{\alpha}_S + \tilde{\alpha} </math> and <math>\hat{\beta} \leftarrow \hat{\beta}_S + \tilde{\beta} </math>.

Local case-control sampling: Difference between revisions