Revision as of 08:36, 22 October 2010 edit Benwing (talk \| contribs) Extended confirmed users, Pending changes reviewers 8,038 edits add example usage of small value ← Previous edit		Revision as of 08:38, 22 October 2010 edit undo Benwing (talk \| contribs) Extended confirmed users, Pending changes reviewers 8,038 edits fix grammar a bit Next edit →
Line 6: In the case of a Dirichlet distribution, a concentration parameter of 1 results in all sets of probabilities being equally likely, i.e. in this case the Dirichlet distribution of dimension ''k'' is equivalent to a uniform distribution over a ''k''-dimensional simplex. Note that this is ''not'' the same as what happens when the concentration parameter tends towards infinity. In the former case, all resulting distributions are equally likely (the distribution over distributions is uniform). In the latter case, only near-uniform distributions are likely (the distribution over distributions is highly peaked around the uniform distribution). Meanwhile, in the limit as the concentration parameter tends towards zero, only distributions with nearly all mass concentrated on one of their components are likely (the distribution over distributions is highly peaked around the ''k'' possible [[Dirac delta distribution]]s centered on one of the components, or in terms of the ''k''-dimensional simplex, is highly peaked at corners of the simplex). An example of where a sparse prior (concentration parameter much less than 1) is called for, consider a [[topic model]], which is used to learn the topics that are discussed in a set of documents, where each "topic" is described using a [[categorical distribution]] over a vocabulary of words. A typical vocabulary might have 100,000 words, leading to a 100,000-dimensional categorical distribution. The [[prior distribution]] for ~~this~~the parameters of the categorical distribution would likely be a [[symmetric Dirichlet distribution]]. However, a coherent topic might only have a few hundred words with any significant probability mass. Accordingly, a reasonable setting for the concentration parameter might be 0.01 or 0.001. With a larger vocabulary of around 1,000,000 words, an even smaller value, e.g. 0.0001, might be appropriate. ==See also==

Concentration parameter: Difference between revisions