Felsenstein's tree-pruning algorithm: Difference between revisions

Content deleted Content added
m v2.05b - WPcleaner - Fix errors for CW project (Heading start with three "=" and later with level two)
Fallog (talk | contribs)
m Adding pictures of phylogenetic trees to illustrate my points.
Line 5:
 
== Details ==
[[File:Tree_exemple.png|thumb|A simple phylogenetic tree exemple made from arbitrary data D]]
The '''likelihood''' of a tree <math>T</math> is, by definition, the probability of observing certain data <math>D</math> (<math>D</math> being a nucleotide sequence alignement for example ''i.e.'' a succesion of <math> n </math> DNA site <math> s </math>) given the tree. It is often written : <math>P(D|T)</math>.
 
Line 11 ⟶ 12:
 
This is a key value and is often quite complicated to compute. To ease the computations, Felsenstein and his colleagues used several assumptions that are still widely used today. The '''main assumption''' is that '''mutations between DNA sites are independant''' of each other. This permits to compute the likelihood as a simple product of probabilities. Now you can divide the data <math>D</math> between several <math>D_s</math> for each nucleotide site <math>s</math> inside of <math>D</math>. The global likelihood of the tree will be the product of the likelihoods of each site:
[[File:Tree_partial_exemple.png|thumb|Same tree but made from D1, which consists in the first DNA sites from D]]
 
<math>
P(D|T) = \prod_{s=1}^{n} {P(D_s|T)}
</math>
 
If I reuse the exemple above, <math>D_1</math> tree would be:
 
 
The '''second assumption''' concerns the [[Substitution model|models of DNA DNA sequence evolution]]. The computation of the likelihood needs the nucleotide frequencies as well as the transition probabilities (when a mutation occurs, probability of going from a nucleotide to another). The simplest model is the [[Jukes-Cantor model]], assuming equal nucleotide frequencies <math>\left(\pi_A = \pi_G = \pi_C = \pi_T = {1\over4}\right)</math> and equal transition probabilities from <math>X</math> to <math>Y</math> (<math>p_{A \rightarrow T} = p_{A \rightarrow C} = p_{A \rightarrow G} = \frac{1}{4} (1 - e^{-\mu l}) </math> and <math>p_{A \rightarrow A} = e^{-\mu l} + \frac{1}{4} (1 - e^{-\mu l}) </math>) and idem for the other bases). Here <math> \mu </math> is the global [[mutation rate]] of the model.
 
Felsenstein proposed to decomposed computations even more by using "partial likelihoods" in the computation of <math>
P( D_s | T)
</math>. Here is how it works.

Assume we are on a node <math>
k
</math> on the tree <math>
Line 49 ⟶ 52:
<math> P(D_s|T) = \sum_X p_X \centerdot w_r (X) </math>
 
After doing so for every site <math>s</math>, one can finally obtain the likelihood of the global evolutionary tree by multiplying each "sublikelihood".
 
== Algorithm ==
 
== Simple Exemple ==
 
==References==