Content deleted Content added
Added references for the algorithm description of FSA. These edits are for a university course project. |
m →External links: HTTP to HTTPS for SourceForge |
||
(5 intermediate revisions by 4 users not shown) | |||
Line 1:
{{No inline|date=June 2024}}{{Infobox Software|name=FSA|developer=Robert Bradley ([[UC Berkeley]]), Colin Dewey ([[UW Madison]]), [[Lior Pachter]] ([[UC Berkeley]])|latest_release_version=1.5.2|operating_system=[[UNIX]], [[Linux]], [[Apple Macintosh|Mac]]|genre=Bioinformatics tool|licence=Open source}}
'''Fast statistical alignment''' ('''FSA''') is a [[multiple sequence alignment]] program for aligning many proteins,
FSA is currently being used for multiple projects, including sequencing new worm genomes and analyzing ''in vivo'' transcription factor binding in flies.
== Input/Output ==
This program accepts sequences in [[FASTA format]] and outputs alignments in [[FASTA format]] or [[Stockholm format]].
Line 13 ⟶ 12:
=== Pair Hidden Markov Model for generating posterior probabilities===
The algorithm starts first by determining [[posterior probabilities]] of alignment <math>\mathbb{P}(A|X, Y)</math> between any two random sequences from the pool of sequences being aligned. The posterior probabilities for each column reinforce the prediction of alignment probability between a sequence pair and also filter out columns that can be unreliably aligned. These probabilities also allow for the prediction and estimate of homology between any sequence pair.
Since the number of
=== Merging Probabilities ===
Line 21 ⟶ 20:
=== Sequence Annealing ===
Most existing programs that run multiple sequence alignment algorithms are based on progressive alignment where the process starts with a
FSA uses the sequence annealing technique to overcome this issue. The sorted posterior probabilities are used with the sequence annealing technique to generate a multiple alignment. The technique finds the alignment between two sequences that minimizes the expected distance to the truth.
The sequence annealing technique, by determining an alignment with the minimum expected distance to the truth, conversely finds the alignment with the maximum expected accuracy. The accuracy of an alignment depends on a
=== Ordering of the alignment ===
FSA aligns multiple sequences based on homology within columns instead of strictly a consideration of indels and substitutions. As such, FSA considers alignments to be equivalent if for every position along the sequences in both alignments, the same statement about homology can be made. For example, when considering pairwise comparisons, if there is a gap at a specific position in two alignments, then it can be said that the two sequences being compared are not homologous at said position. This can result in alignments where gap-open events can differ and yet still be considered equivalent. As such, FSA chooses to output the alignment in which there is a minimum amount of
== Parallelization ==
To handle overly large datasets, FSA is able to divide the work of running all necessary pairwise comparisons and alignments to different processors. This is handled by using a
== Visualization ==
The results of the multiple sequence alignment under FSA can be displayed under the
== Comparisons to other
FSA has been benchmarked against multiple alignment databases for protein (SABmark 1.65 and BAliBASE 3), RNA (BRAliBase 2.1 and Consanmix80), and DNA sequences. These benchmarks were conducted alongside other popular alignment programs such as ClustalW, MAFFT, MUSCLE, T-Coffee, and so on. Overall, at the time that
== References ==
{{cite journal|vauthors=Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L|author8-link=Lior Pachter|date=2009|title=Fast Statistical Alignment|journal=PLOS Computational Biology |volume=5|issue=5|pages=e1000392|doi=10.1371/journal.pcbi.1000392|pmid=19478997|pmc=2684580|bibcode=2009PLSCB...5E0392B |doi-access=free }}▼
▲{{cite journal|vauthors=Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L|author8-link=Lior Pachter|date=2009|title=Fast Statistical Alignment|journal=PLOS Computational Biology |volume=5|issue=5|pages=e1000392|doi=10.1371/journal.pcbi.1000392|pmid=19478997|pmc=2684580|bibcode=2009PLSCB...5E0392B}}
Schwartz AS, Pachter L (2007) Multiple alignment by sequence annealing.
Bioinformatics 23: e24-9.
Eddy SR. Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol. 1995;3:114-20. PMID
== External links ==
* [https://web.archive.org/web/20090430184649/http://orangutan.math.berkeley.edu/fsa/ FSA web server]
* [
[[Category:Bioinformatics]]
|