Word n-gram language model: Difference between revisions

Content deleted Content added
Line 100:
''n''-gram-based searching was also used for [[plagiarism detection]].
 
== Bias–variance tradeoff ==
== Bias-versus-variance trade-off ==
{{Main|Bias–variance tradeoff}}
To choose a value for ''n'' in an ''n''-gram model, it is necessary to find the right trade-off between the stability of the estimate against its appropriateness. This means that trigram (i.e. triplets of words) is a common choice with large training corpora (millions of words), whereas a bigram is often used with smaller ones.