Content deleted Content added
→Hidden Markov models: rm external links |
→Motif finding: rm external links |
||
Line 97:
Motif finding, also known as profile analysis, is a method of locating [[sequence motif]]s in global MSAs that is both a means of producing a better MSA and a means of producing a scoring matrix for use in searching other sequences for similar motifs. A variety of methods for isolating the motifs have been developed, but all are based on identifying short highly conserved patterns within the larger alignment and constructing a matrix similar to a substitution matrix that reflects the amino acid or nucleotide composition of each position in the putative motif. The alignment can then be refined using these matrices. In standard profile analysis, the matrix includes entries for each possible character as well as entries for gaps.<ref name="mount" /> Alternatively, statistical pattern-finding algorithms can identify motifs as a precursor to an MSA rather than as a derivation. In many cases when the query set contains only a small number of sequences or contains only highly related sequences, [[pseudocount]]s are added to normalize the distribution reflected in the scoring matrix. In particular, this corrects zero-probability entries in the matrix to values that are small but nonzero.
Blocks analysis is a method of motif finding that restricts motifs to ungapped regions in the alignment. Blocks can be generated from an MSA or they can be extracted from unaligned sequences using a precalculated set of common motifs previously generated from known gene families.<ref name="henikoff1991">{{cite journal | vauthors = Henikoff S, Henikoff JG | title = Automated assembly of protein blocks for database searching | journal = Nucleic Acids Res. | volume = 19 | issue = 23 | pages = 6565–72 | date = December 1991 | pmid = 1754394 | pmc = 329220 | doi = 10.1093/nar/19.23.6565 }}</ref> Block scoring generally relies on the spacing of high-frequency characters rather than on the calculation of an explicit substitution matrix.
Statistical pattern-matching has been implemented using both the [[expectation-maximization algorithm]] and the [[Gibbs sampler]]. One of the most common motif-finding tools, known as [[Multiple EM for Motif Elicitation|MEME]], uses expectation maximization and hidden Markov methods to generate motifs that are then used as search tools by its companion MAST in the combined suite
===Non-coding multiple sequence alignment===
|