Content deleted Content added
sections |
order |
||
Line 11:
==Sparse prior==
An example of where a sparse prior (concentration parameter much less than 1) is called for, consider a [[topic model]], which is used to learn the topics that are discussed in a set of documents, where each "topic" is described using a [[categorical distribution]] over a vocabulary of words. A typical vocabulary might have 100,000 words, leading to a 100,000-dimensional categorical distribution. The [[prior distribution]] for the parameters of the categorical distribution would likely be a [[symmetric Dirichlet distribution]]. However, a coherent topic might only have a few hundred words with any significant probability mass. Accordingly, a reasonable setting for the concentration parameter might be 0.01 or 0.001. With a larger vocabulary of around 1,000,000 words, an even smaller value, e.g. 0.0001, might be appropriate.
== References ==▼
{{reflist}}▼
==See also==
Line 21 ⟶ 18:
* [[Location parameter]]
* [[Scale parameter]]
▲== References ==
▲{{reflist}}
[[Category:Statistical parameters]]
|