Fast statistical alignment: Difference between revisions

Content deleted Content added
Diescar (talk | contribs)
Added a short description of the algorithm, parallelization capability, visualization for the results of the FSA program, and comparisons to other alignment programs. These changes were made for a university course as a project.
Diescar (talk | contribs)
Added a section about the ordering of the alignment at the end of the algorithm. These changes have been made for a university course project.
Line 10:
 
== Algorithm ==
The algorithm for the aligning of the input sequences has 54 core components.
 
=== Pair Hidden Markov Model for generating posterior probabilities===
Line 26:
 
The sequence annealing technique, by determining an alignment with the minimum expected distance to the truth, conversely finds the alignment with the maximum expected accuracy. The accuracy of an alignment depends on a “true” alignment as reference and indicates the fraction of columns where the sequences are homologous. This accuracy is then used as an objective function that starts with the unaligned sequences (null alignment) and aligns characters in different columns based on the increasing accuracy of an alignment.
 
=== Ordering of the alignment ===
FSA aligns multiple sequences based on homology within columns instead of strictly a consideration of indels and substitutions. As such, FSA considers alignments to be equivalent if for every position along the sequences in both alignments, the same statement about homology can be made. For example when considering pairwise comparisons, if there is a gap at a specific position in two alignments, then it can be said that the two sequences being compared are not homologous at said position. This can result in alignments where gap-open events can differ and yet still be considered equivalent. As such, FSA chooses to output the alignment in which there is a minimum amount of “gap openings.”
 
== Parallelization ==