Revision as of 13:42, 31 March 2008 edit Kingturtle (talk \| contribs) Administrators 63,477 edits -{{Orphan\|date=August 2006}} ← Previous edit		Revision as of 14:45, 2 August 2008 edit undo Dmcarter (talk \| contribs) 2 edits corrections to English Next edit →
Line 1: The '''factored language model''' ('''FLM''') is an extension of a conventional [[language model]]. In an FLM, each word is viewed as a vector of ''k'' factors: <math>w_i = \{f_i^1, ..., f_i^k\}</math>. An FLM provides the probabilistic model <math>P(f\|f_i, ..., f_N)</math> where the prediction of a factor <math>f</math> is based on <math>N</math> parents <math>\{f_1, ..., f_N\}</math>. For an example, if <math>w</math> represents a word token and <math>t</math> represents a [[Part of speech]] tag for English, the ~~model~~expression <math>P(w_i\|w_{i-2}, w_{i-1}, t_{i-1})</math> gives a model for predicting current ~~work~~word token based on a traditional [[Ngram]] model as well as the [[Part of speech]] tag of the previous word. A ~~main~~major advantage of factored language models is that they allow users to ~~put in~~specify linguistic knowledge such as ~~explicitly model~~ the relationship between word tokens and [[Part of speech]] in English, or morphological information (stems, root, etc.) in Arabic. Like [[N-gram]] models, smoothing techniques are necessary in parameter estimation. In particular, generalized ~~backing~~back-off is used in training an FLM. ==References==

Factored language model: Difference between revisions