Multiple sequence alignment: Difference between revisions

Content deleted Content added
Motif finding: rm external links
Non-coding multiple sequence alignment: defined an abbreviation and rm external link
Line 102:
 
===Non-coding multiple sequence alignment===
Non-coding DNA regions, especially [[transcription factor]] binding sites (TFBSs), are rather more conserved, andbut not necessarily evolutionarily related, and may have converged from non-common ancestors. Thus, the assumptions used to align protein sequences and DNA coding regions are inherently different from those that hold for TFBS sequences. Although it is meaningful to align DNA coding regions for homologous sequences using mutation operators, alignment of binding site sequences for the same transcription factor cannot rely on evolutionary related mutation operations. Similarly, the evolutionary operator of point mutations can be used to define an edit distance for coding sequences, but this has little meaning for TFBS sequences because any sequence variation has to maintain a certain level of specificity for the binding site to function. This becomes specifically important when trying to align known TFBS sequences to build supervised models to predict unknown locations of the same TFBS. Hence, Multiple Sequence Alignment methods need to adjust the underlying evolutionary hypothesis and the operators used as in the work published incorporating neighbouring base thermodynamic information <ref name=Salama2013>{{cite journal | vauthors = Salama RA, Stekel DJ | title = A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites | journal = Bioinformatics | volume = 29 | issue = 21 | pages = 2699–704 | date = November 2013 | pmid = 23990411 | doi = 10.1093/bioinformatics/btt463 | doi-access = free }}</ref> to align the binding sites searching for the lowest thermodynamic alignment conserving specificity of the binding site, [http://sourceforge.net/projects/msa-edna/ EDNA] .
 
==Optimization==