Fast statistical alignment: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: doi-access. | Use this bot. Report bugs. | #UCB_CommandLine
m v2.05 - Fix errors for CW project (PMID with incorrect syntax - Spelling and typography)
Line 1:
{{Infobox Software|name=FSA|developer=Robert Bradley ([[UC Berkeley]]), Colin Dewey ([[UW Madison]]), [[Lior Pachter]] ([[UC Berkeley]])|latest_release_version=1.5.2|operating_system=[[UNIX]], [[Linux]], [[Apple Macintosh|Mac]]|genre=Bioinformatics tool|licence=Open source}}
 
'''Fast statistical alignment''' or '''FSA''' is a [[multiple sequence alignment]] program for aligning many proteins or RNAs or long genomic DNA sequences. Along with [[MUSCLE (alignment software)|MUSCLE]] and [[MAFFT]], FSA is one of the few sequence alignment programs which can align datasets of hundreds or thousands of sequences. FSA uses a different optimization criterion which allows it to more reliably identify non-homologous sequences than these other programs, although this increased accuracy comes at the cost of decreased speed.
 
FSA is currently being used for projects including sequencing new worm genomes and analyzing ''in vivo'' transcription factor binding in flies.
 
== Input/Output ==
 
This program accepts sequences in [[FASTA format]] and outputs alignments in [[FASTA format]] or [[Stockholm format]].
 
Line 28 ⟶ 27:
 
=== Ordering of the alignment ===
FSA aligns multiple sequences based on homology within columns instead of strictly a consideration of indels and substitutions. As such, FSA considers alignments to be equivalent if for every position along the sequences in both alignments, the same statement about homology can be made. For example, when considering pairwise comparisons, if there is a gap at a specific position in two alignments, then it can be said that the two sequences being compared are not homologous at said position. This can result in alignments where gap-open events can differ and yet still be considered equivalent. As such, FSA chooses to output the alignment in which there is a minimum amount of “gap openings.”
 
== Parallelization ==
Line 34 ⟶ 33:
 
== Visualization ==
The results of the multiple sequence alignment under FSA can be displayed under the FSA’sFSA's own GUI. The GUI is able to display and color label different measures of alignment quality on the columns of characters within the alignment itself. The five different measures that can be observed and are approximated under the FSA model include accuracy, sensitivity, certainty, specificity, and consistency.
 
== Comparisons to other alignment programs ==
FSA has been benchmarked against multiple alignment databases for protein (SABmark 1.65 and BAliBASE 3), RNA (BRAliBase 2.1 and Consanmix80), and DNA sequences. These benchmarks were conducted alongside other popular alignment programs such as ClustalW, MAFFT, MUSCLE, T-Coffee, and so on. Overall, at the time that FSA’sFSA's abstract and research paper was received for review, FSA outperformed most alignment programs in accuracy and positive predictive values with sensitivities being on-par with the better-performing programs such as MAFFT and ProbConsRNA. Runtime comparisons were also conducted by comparing the timings to align 16S ribosomal sequences. MAFFT performed the alignment faster than the other alignment programs while MUSCLE and FSA (using a 3-state HMM and with disabled iterative refinement) were the next fastest programs.
 
== References ==
 
{{cite journal|vauthors=Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L|author8-link=Lior Pachter|date=2009|title=Fast Statistical Alignment|journal=PLOS Computational Biology |volume=5|issue=5|pages=e1000392|doi=10.1371/journal.pcbi.1000392|pmid=19478997|pmc=2684580|bibcode=2009PLSCB...5E0392B |doi-access=free }}
 
Line 46 ⟶ 44:
Bioinformatics 23: e24-9.
 
Eddy SR. Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol. 1995;3:114-20. PMID: 7584426.
 
== External links ==