Content deleted Content added
→Example: removing for now, since it's unhelpful. |
→Introduction: rearrange and remove unsourced "oldest method" assertion |
||
Line 1:
[[Image:Zinc-finger-dot-plot.png|thumb|right|upright=1.4|A [[DNA]] dot plot of a [[human]] [[zinc finger]] [[transcription factor]] (GenBank ID NM_002383), showing regional [[self-similarity]]. The main diagonal represents the [[sequence alignment|sequence's alignment]] with itself; lines off the main diagonal represent similar or repetitive patterns within the sequence.]]
{{About|the biological sequences comparison plot|the statistical plot|Dot plot (statistics)}}
In [[bioinformatics]] a '''dot plot''' is a graphical method that allows the comparison of two [[Sequence (biology)|biological sequences]] and identify regions of close similarity between them. It is a
==
One way to visualize the similarity between two protein or nucleic acid sequences is to use a similarity matrix, known as a dot plot. These were introduced by Gibbs and McIntyre in 1970<ref name="gibbs-mcintyre"/> and are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix.
== Interpretation ==
Some idea of the similarity of the two sequences can be gleaned from the number and length of matching segments shown in the matrix. Identical proteins will obviously have a diagonal line in the center of the matrix. Insertions and deletions between sequences give rise to disruptions in this diagonal. Regions of local similarity or repetitive sequences give rise to further diagonal matches in addition to the central diagonal. Because of the limited protein alphabet, many matching sequence segments may simply have arisen by chance. One way of reducing this noise is to only shade runs or '[[tuple]]s' of residues, e.g. a tuple of 3 corresponds to three residues in a row. This is effective because the probability of matching three residues in a row by chance is much lower than single-residue matches. It can be seen from Figures 3.3h,c that the number of diagonal runs in the matrix has been considerably reduced by looking for 2-tuples or 3-tuples.
Dot plots
==See also==
|