Unrooted binary tree: Difference between revisions

Content deleted Content added
m References: fix doi
OAbot (talk | contribs)
m Open access bot: hdl updated in citation with #oabot.
 
(37 intermediate revisions by 21 users not shown)
Line 1:
[[File:TreeActinobacteria.svg|thumb|360px|A [[cladogram]] in the form of an unrooted binary tree, representing the similarities and evolutionary history among species of [[Actinomycetota|actinobacteria]].]]
In mathematics and computer science, an '''unrooted binary tree''' is an [[free tree|unrooted tree]] in which each [[vertex (graph theory)|vertex]] has either one or three neighbors.
 
Line 5:
A [[free tree]] or unrooted tree is a [[connected graph|connected]] [[undirected graph]] with no [[cycle (graph theory)|cycles]]. The vertices with one neighbor are the ''leaves'' of the tree, and the remaining vertices are the ''internal nodes'' of the tree. The [[degree (graph theory)|degree]] of a vertex is its number of neighbors; in a tree with more than one node, the leaves are the vertices of degree one. An unrooted binary tree is a free tree in which all internal nodes have degree exactly three.
 
In some applications it may make sense to distinguish subtypes of unrooted binary trees: a [[planar graph|planar embedding]] of the tree may be fixed by specifying a cyclic ordering for the edges at each vertex, making it into a [[Ordered tree|plane tree]]. In computer science, binary trees are often rooted and ordered when they are used as [[data structure]]s, but in the applications of unrooted binary trees in [[hierarchical clustering]] and [[evolutionary tree]] reconstruction, unordered trees are more common.<ref name="f84"/>
 
Additionally, one may distinguish between trees in which all vertices have distinct labels, trees in which the leaves only are labeled, and trees in which the nodes are not labeled. In an unrooted binary tree with ''n'' leaves, there will be ''n''&nbsp;&minus;&nbsp;12 internal nodes, so the labels may be taken from the set of integers from 1 to 2''n''&nbsp;&minus;&nbsp;1 when all nodes are to be labeled, or from the set of integers from 1 to ''n'' when only the leaves are to be labeled.<ref name="f84">{{harvtxt|Furnas|1984}}.</ref>
 
==Related structures==
===Rooted binary trees===
{{Main|Rooted binary tree}}
An unrooted binary tree ''T'' may be transformed into a full rooted [[binary tree]] (that is, a rooted tree in which each non-leaf node has exactly two children) by choosing a ''root edge'' ''e'' of ''T'', placing a new root node in the middle of ''e'', and directing every edge of the resulting subdivided tree away from the root node. Conversely, any full rooted binary tree may be transformed into an unrooted binary tree by removing the root node, replacing the path between its two children by a single undirected edge, and suppressing the orientation of the remaining edges in the graph. For this reason, there are exactly 2''n''&nbsp;&minus;3 times as many full rooted binary trees with ''n'' leaves as there are unrooted binary trees with ''n'' leaves.<ref name="f84"/>
 
Line 24 ⟶ 25:
 
===Branch-decomposition===
Unrooted binary trees are also used to define [[branch-decomposition]]s of graphs, by forming an unrooted binary tree whose leaves represent the edges of the given graph. That is, a branch-decomposition may be viewed as a hierarchical clustering of the edges of the graph. Branch-decompositions and an associated numerical quantity, branch-width, are closely related to [[treewidth]] and form the basis for efficient [[dynamic programming]] algorithms on graphs.<ref name="rs91">{{harvtxt|Robertson|Seymour|1991}}.</ref>
 
==Enumeration==
Because of their applications in hierarchical clustering, the most natural [[graph enumeration]] problem on unrooted binary trees is to count the number of trees with ''n'' labeled leaves and unlabeled internal nodes. An unrooted binary tree on ''n'' labeled leaves can be formed by connecting the ''n''th leaf to a new node in the middle of any of the edges of an unrooted binary tree on ''n''&nbsp;&minus;&nbsp;1 labeled leaves. There are 2''n''&nbsp;&minus;&nbsp;5 edges at which the ''n''th node can be attached; therefore, the number of trees on ''n'' leaves is larger than the number of trees on ''n''&nbsp;&minus;&nbsp;1 leaves by a factor of 2''n''&nbsp;&minus;&nbsp;5. Thus, the number of trees on ''n'' labeled leaves is the [[double factorial]]
:<math>(2n-5)!!=\frac{(2n-4)!}{(n-2)!2^{n-2}}.</math><ref>{{harvtxt|Balding|Bishop|Cannings|2007}}.</ref>
The numbers of trees on 2, 3, 4, 5, ... labeled leaves are
:1, 1, 3, 15, 105, 945, 10395, 135135, 2027025, 34459425, ... {{OEIS|A001147}}.
 
==Fundamental Equalities==
The leaf-to-leaf path-length on a fixed Unrooted Binary Tree (UBT) T encodes the number of edges belonging to the unique path in T connecting a given leaf to another leaf. For example, by referring to the UBT shown in the image on the right, the path-length <math> p_{1,2} </math> between the leaves 1 and 2 is equal to 2 whereas the path-length <math> p_{1,3} </math> between the leaves 1 and 3 is equal to 3. The path-length sequence from a given leaf on a fixed UBT T encodes the lengths of the paths from the given leaf to all the remaining ones. For example, by referring to the UBT shown in the image on the right, the path-length sequence from the leaf 1 is <math> p_1=(p_{1,2}, p_{1,3}, p_{1,4})=(2,3,3) </math>. The set of path-length sequences associated to the leaves of T is usually referred to as the ''path-length sequence collection'' of T <ref name="On the Balanced Minimum Evolution P">{{cite journal | vauthors = Catanzaro D, Pesenti R, Wolsey L | title = On the Balanced Minimum Evolution Polytope | journal = Discrete Optimization | volume = 36 | date = 2020 | page = 100570 | doi = 10.1016/j.disopt.2020.100570 | s2cid = 213389485 | doi-access = free | hdl = 2078.1/230413 | hdl-access = free }}</ref>
.
[[File:An example of an unrooted binary tree with four leaves.pdf|thumb|An example of an unrooted binary tree with four leaves]]
 
Daniele Catanzaro, [[Raffaele Pesenti]] and [[Laurence Wolsey]] showed<ref name="On the Balanced Minimum Evolution P"/> that the path-length sequence collection encoding a given UBT with n leaves must satisfy specific equalities, namely
 
* <math> p_{i,i}=0 </math> for all <math> i\in [1,n]</math>
* <math> p_{i,j}=p_{j,i} </math> for all <math> i,j\in [1,n]: i\neq j</math>
* <math> p_{i,j}\leq p_{i,k} + p_{k,j} </math> for all <math> i,j,k\in [1,n]: i\neq j\neq k</math>
* <math> \sum_{j=1}^{n} 1/2^{p_{i,j}}=1/2 </math> for all <math> i\in [1,n]</math> (which is an adaptation of the [[Kraft–McMillan inequality]])
* <math> \sum_{i=1}^n\sum_{j=1}^{n} p_{i,j}/2^{p_{i,j}}=2n-3 </math>, also referred to as the ''phylogenetic manifold''.<ref name="On the Balanced Minimum Evolution P"/>
 
These equalities are proved to be necessary and independent for a path-length collection to encode an UBT with n leaves.<ref name="On the Balanced Minimum Evolution P"/> It is currently unknown whether they are also sufficient.
 
==Alternative names==
Unrooted binary trees have also been called '''free binary trees''',<ref>{{harvtxt|Czumaj|Gibbons|1996}}.</ref> '''cubic trees''',<ref>{{harvtxt|Exoo|1996}}.</ref> '''ternary trees'''<ref name="rs91"/> and '''unrooted ternary trees''',.<ref>{{harvtxt|Cilibrasi|Vitanyi|2006}}.</ref>. However, the "free binary tree" name has also been applied to unrooted trees that may have degree-two nodes<ref>{{harvtxt|Harary|Palmer|Robinson|1992}}.</ref> and to rooted binary trees with unordered children.,<ref>{{harvtxt|Przytycka|Larmore|1994}}.</ref> and the "ternary tree" name is more frequently used to mean a [[ternary tree|rooted tree with three children per node]].
 
==Notes==
Line 34 ⟶ 56:
==References==
*{{citation
| last1 = Balding | first1 = D. J.
| last2 = Bishop | first2 = Martin J.
| last3 = Cannings | first3 = Christopher
| edition = 3rd
| isbn = 978-0-470-05830-5
| page = 502
| publisher = Wiley-Interscience
| title = Handbook of Statistical Genetics
| volume = 1
| year = 2007}}.
*{{cite arXiv
| last1 = Cilibrasi | first1 = Rudi
| last2 = Vitanyi | first2 = Paul M.B.
| ideprint = {{arxiv|cs/0606048}}
| title = A new quartet tree heuristic for hierarchical clustering
| year = 2006}}.
Line 48 ⟶ 81:
| title = Guthrie's problem: new equivalences and rapid reductions
| volume = 154
| year = 1996}}.| doi-access = free
}}.
*{{citation
| last = Eppstein | first = David
| doi = 10.1145/1541885.1541890
| idarxiv = {{arxiv|cs.CG/0604034}}
| issue = 3
| journal = ACM Transactions on Algorithms
Line 58 ⟶ 92:
| title = Squarepants in a tree: Sum of subtree clustering and hyperbolic pants decomposition
| volume = 5
| year = 2009}}.| s2cid = 2434
}}.
*{{citation
| last = Exoo | first = Geoffrey
| issue = 1
| journal = Electronic Journal of Combinatorics
| page = R30
| title = A simple method for constructing small cubic graphs of girths 14, 15, and 16
| url = http://www.combinatorics.org/Volume_3/PDFFiles/v3i1r30.pdf
| volume = 3
| year = 1996| doi = 10.37236/1254
| doi-access = free
}}.
*{{citation
| last = Furnas | first = George W.
Line 67 ⟶ 113:
| title = The generation of random, binary unordered trees
| volume = 1
| year = 1984}}.| s2cid = 121121529
}}.
*{{citation
| last1 = Harary | first1 = Frank | author1-link = Frank Harary
Line 84 ⟶ 131:
| issue = 4
| journal = Systematic Biology
| pagepages = 297–309
| title = A framework for the quantitative study of evolutionary trees
| urljstor = http://www.jstor.org/stable/2992396
| volume = 38
| year = 1989}}
*{{citation
| last1 = Przytycka | first1 = Teresa M. | author1-link = Teresa Przytycka
| last2 = Larmore | first2 = Lawrence L. | author2-link = Lawrence L. Larmore
| contribution = The optimal alphabetic tree problem revisited
Line 99 ⟶ 146:
| title = Proc. 21st International Colloquium on Automata, Languages and Programming (ICALP '94)
| volume = 820
| year = 1994| isbn = 978-3-540-58201-4 }}.
*{{citation
| last1 = Robertson | first1 = Neil | author1-link = Neil Robertson (mathematician)
Line 109 ⟶ 156:
| title = Graph minors. X. Obstructions to tree-decomposition
| volume = 52
| year = 1991}}.| doi-access = free
}}.
*{{citation
| last1 = St. John | first1 = Katherine
| last2 = Warnow | first2 = Tandy | author2-link = Tandy Warnow
| last3 = Moret | first3 = Bernard M. E. | author3-link = Bernard Moret
| last4 = Vawterd | first4 = Lisa
| doi = 10.1016/S0196-6774(03)00049-X
| issue = 1
| journal = Journal of Algorithms
| pagepages = 173–193
| title = Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining
| volume = 48
| year = 2003}}.| s2cid = 5550338
| url = http://infoscience.epfl.ch/record/97874/files/jalgs.pdf
}}.
 
[[Category:Trees (graph theory)]]