Content deleted Content added
Lindsey40186 (talk | contribs) m v2.05 - Fix errors for CW project (PMID with incorrect syntax - Spelling and typography) |
Lindsey40186 (talk | contribs) tagged: no inline citations (and only 4 sources?); ce Tags: nowiki added Visual edit |
||
Line 1:
{{No inline|date=June 2024}}{{Infobox Software|name=FSA|developer=Robert Bradley ([[UC Berkeley]]), Colin Dewey ([[UW Madison]]), [[Lior Pachter]] ([[UC Berkeley]])|latest_release_version=1.5.2|operating_system=[[UNIX]], [[Linux]], [[Apple Macintosh|Mac]]|genre=Bioinformatics tool|licence=Open source}}
'''Fast statistical alignment''' or '''FSA''' is a [[multiple sequence alignment]] program for aligning many proteins,
FSA is currently being used for multiple projects, including sequencing new worm genomes and analyzing ''in vivo'' transcription factor binding in flies.
== Input/Output ==
Line 12:
=== Pair Hidden Markov Model for generating posterior probabilities===
The algorithm starts first by determining [[posterior probabilities]] of alignment <math>\mathbb{P}(A|X, Y)</math> between any two random sequences from the pool of sequences being aligned. The posterior probabilities for each column reinforce the prediction of alignment probability between a sequence pair and also filter out columns that can be unreliably aligned. These probabilities also allow for the prediction and estimate of homology between any sequence pair.
Since the number of
=== Merging Probabilities ===
Line 20:
=== Sequence Annealing ===
Most existing programs that run multiple sequence alignment algorithms are based on progressive alignment where the process starts with a
FSA uses the sequence annealing technique to overcome this issue. The sorted posterior probabilities are used with the sequence annealing technique to generate a multiple alignment. The technique finds the alignment between two sequences that minimizes the expected distance to the truth.
The sequence annealing technique, by determining an alignment with the minimum expected distance to the truth, conversely finds the alignment with the maximum expected accuracy. The accuracy of an alignment depends on a
=== Ordering of the alignment ===
FSA aligns multiple sequences based on homology within columns instead of strictly a consideration of indels and substitutions. As such, FSA considers alignments to be equivalent if for every position along the sequences in both alignments, the same statement about homology can be made. For example, when considering pairwise comparisons, if there is a gap at a specific position in two alignments, then it can be said that the two sequences being compared are not homologous at said position. This can result in alignments where gap-open events can differ and yet still be considered equivalent. As such, FSA chooses to output the alignment in which there is a minimum amount of
== Parallelization ==
To handle overly large datasets, FSA is able to divide the work of running all necessary pairwise comparisons and alignments to different processors. This is handled by using a
== Visualization ==
The results of the multiple sequence alignment under FSA can be displayed under the FSA's own GUI. The GUI is able to display and color label different measures of alignment quality on the columns of characters within the alignment itself. The five different measures that can be observed and are approximated under the FSA model include accuracy, sensitivity, certainty, specificity, and consistency.
== Comparisons to other
FSA has been benchmarked against multiple alignment databases for protein (SABmark 1.65 and BAliBASE 3), RNA (BRAliBase 2.1 and Consanmix80), and DNA sequences. These benchmarks were conducted alongside other popular alignment programs such as ClustalW, MAFFT, MUSCLE, T-Coffee, and so on. Overall, at the time that FSA's abstract and research paper was received for review, FSA outperformed most alignment programs in accuracy and positive predictive values with sensitivities being on-par with the better-performing programs such as MAFFT and ProbConsRNA. Runtime comparisons were also conducted by comparing the timings to align 16S ribosomal sequences. MAFFT performed the alignment faster than the other alignment programs while MUSCLE and FSA (using a 3-state HMM and with disabled iterative refinement) were the next fastest programs.
|