Multiple sequence alignment: Difference between revisions

Content deleted Content added
major rewrite/expansion, destub - markov models and genetic algorithms to come
m inevitable typo, please don't refconvert (html comment)
Line 1:
<!-- please don't refconvert this article - I'll do it once I've finished the text ~~~~ -->
 
[[Image:RPLP0_90_ClustalW_aln.gif|right|thumb|300px|First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with [[ClustalW]].]]
 
A '''multiple sequence alignment (MSA)''' is a [[sequence alignment]] of three or more [[biological sequencessequence]]s, generally [[protein]], [[DNA]], or [[RNA]]. In general, the input set of query sequences are assumed to have an [[evolution]]ary relationship by which they share a lineage and are descended from a common ancestor. From the resulting MSA, sequence [[homology (biology)|homology]] can be inferred and [[molecular phylogeny|phylogenetic analysis]] can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate [[mutation]] events such as point mutations (single [[amino acid]] or [[nucleotide]] changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (or [[indel]]s) that appear as gaps in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence [[conservation (genetics)|conservation]] of [[protein ___domain]]s, [[tertiary structure|tertiary]] and [[secondary structure|secondary]] structures, and even individual amino acids or nucleotides.
 
"Multiple sequence alignment" also refers to the process of aligning such a sequence set. Because three or more sequences of biologically relevant length are nearly impossible to align by hand, computational [[algorithm]]s are used to produce and analyze the alignments. MSAs require more sophisticated methodologies than [[sequence alignment|pairwise alignment]] because they are more computationally complex to produce. Most multiple sequence alignment programs use [[heuristic]] methods rather than [[global optimization]] because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive.