Content deleted Content added
Citation bot (talk | contribs) Alter: chapter. | Use this bot. Report bugs. | #UCB_CommandLine |
m Disambiguating links to Blast (link changed to BLAST (biotechnology); link changed to BLAST (biotechnology); link changed to BLAST (biotechnology)) using DisamAssist. |
||
Line 125:
Thus, the number of gaps in an alignment is usually reduced and residues and gaps are kept together, which typically makes more biological sense. The Gotoh algorithm implements affine gap costs by using three matrices.
Dynamic programming can be useful in aligning nucleotide to protein sequences, a task complicated by the need to take into account [[frameshift]] mutations (usually insertions or deletions). The framesearch method produces a series of global or local pairwise alignments between a query nucleotide sequence and a search set of protein sequences, or vice versa. Its ability to evaluate frameshifts offset by an arbitrary number of nucleotides makes the method useful for sequences containing large numbers of indels, which can be very difficult to align with more efficient heuristic methods. In practice, the method requires large amounts of computing power or a system whose architecture is specialized for dynamic programming. The [[BLAST (biotechnology)|BLAST]] and [[EMBOSS]] suites provide basic tools for creating translated alignments (though some of these approaches take advantage of side-effects of sequence searching capabilities of the tools). More general methods are available from [[open-source software]] such as [http://www.ebi.ac.uk/Tools/psa/genewise/ GeneWise].
The dynamic programming method is guaranteed to find an optimal alignment given a particular scoring function; however, identifying a good scoring function is often an empirical rather than a theoretical matter. Although dynamic programming is extensible to more than two sequences, it is prohibitively slow for large numbers of sequences or extremely long sequences.
===Word methods===
Word methods, also known as ''k''-tuple methods, are [[heuristic]] methods that are not guaranteed to find an optimal alignment solution, but are significantly more efficient than dynamic programming. These methods are especially useful in large-scale database searches where it is understood that a large proportion of the candidate sequences will have essentially no significant match with the query sequence. Word methods are best known for their implementation in the database search tools [[FASTA]] and the [[BLAST (biotechnology)|BLAST]] family.<ref name=mount/> Word methods identify a series of short, nonoverlapping subsequences ("words") in the query sequence that are then matched to candidate database sequences. The relative positions of the word in the two sequences being compared are subtracted to obtain an offset; this will indicate a region of alignment if multiple distinct words produce the same offset. Only if this region is detected do these methods apply more sensitive alignment criteria; thus, many unnecessary comparisons with sequences of no appreciable similarity are eliminated.
In the FASTA method, the user defines a value ''k'' to use as the word length with which to search the database. The method is slower but more sensitive at lower values of ''k'', which are also preferred for searches involving a very short query sequence. The BLAST family of search methods provides a number of algorithms optimized for particular types of queries, such as searching for distantly related sequence matches. BLAST was developed to provide a faster alternative to FASTA without sacrificing much accuracy; like FASTA, BLAST uses a word search of length ''k'', but evaluates only the most significant word matches, rather than every word match as does FASTA. Most BLAST implementations use a fixed default word length that is optimized for the query and database type, and that is changed only under special circumstances, such as when searching with repetitive or very short query sequences. Implementations can be found via a number of web portals, such as [http://www.ebi.ac.uk/fasta33/ EMBL FASTA] and [https://www.ncbi.nlm.nih.gov/BLAST/ NCBI BLAST].
Line 212:
* [[Sequence homology]]
* [[Sequence mining]]
* [[BLAST (biotechnology)|BLAST]]
* [[String searching algorithm]]
* [[Alignment-free sequence analysis]]
|