Nested set model: Difference between revisions

Content deleted Content added
Drawbacks: clean up, typo(s) fixed: the the → the using AWB
mNo edit summary
Line 1:
The '''nested set model''' is a particular technique for representing [[nested set]]s (also known as [[tree (data structure)|tree]]s or [[hierarchy|hierarchies]]) in [[relational database]]s.
The term was apparently introduced by [[Joe Celko]]; others describe the same technique using different terms.<ref>[http://www.infokarta.com/content/relational-taboo-recursive-dimensions-and-topological-ordering-nested-sets Recursive Dimension Hierarchies: The Relational Taboo by Michael Kamfonas originally published at the Relational Journal in 1992]</ref><ref>[http://articles.sitepoint.com/article/hierarchical-data-database/2 Storing Hierarchical Data in a Database: ''Modified Pre-order Tree Traversal''], by Gijs van Tulder, at articles.sitepoint.com</ref>
<ref>[http://articles.sitepoint.com/article/hierarchical-data-database/2 Storing Hierarchical Data in a Database: ''Modified Pre-order Tree Traversal''], by Gijs van Tulder, at articles.sitepoint.com</ref>
 
== Motivation ==
The technique is an answer to the problem that the standard [[relational algebra]] and [[relational calculus]], and the [[SQL]] operations based on them, are unable to express all desirable operations on hierarchies directly. A hierarchy can be expressed in terms of a parent-child relation - Celko calls this the [[adjacency list model]] - but if it can have arbitrary depth, this does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one [[Join (relational algebra)#Joins and join-like operators|relational join]] per level. This is often known as the [[bill of materials]] problem.{{citation needed|date=November 2011}}
 
Line 15 ⟶ 14:
When these solutions are not available or not feasible, another approach must be taken.
 
== The technique ==
 
The '''nested set model''' is to number the nodes according to a [[tree traversal]], which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements that use [[rational number]]s instead of integers can avoid renumbering, and so are faster to update, although much more complicated.<ref>{{cite arXiv |eprint=0806.3115 | first= Daniel|last = Hazel | title = Using rational numbers to key nested sets}}</ref>
 
== Example ==
 
In a clothing store catalog, clothing may be categorized according to the hierarchy given on the left:
 
Line 56 ⟶ 53:
 
==Performance==
 
Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as [[MySQL]].<ref>{{Citation
| title= Adjacency list vs. nested sets: MySQL
Line 88 ⟶ 84:
 
==Drawbacks==
 
The use case for a dynamic endless database tree hierarchy is rare. The Nested Set model is appropriate where the tree element and one or two attributes are the only data, but is a poor choice when more complex relational data exists for the elements in the tree. Given an arbitrary starting depth for a category of 'Vehicles' and a child of 'Cars' with a child of 'Mercedes', a foreign key table relationship must be established unless the tree table is naively non-normalized. Attributes of a newly created tree item may not share all attributes with a parent, child or even a sibling. If a foreign key table is established for a table of 'Plants' attributes, no integrity is given to the child attribute data of 'Trees' and its child 'Oak'. Therefore, in each case of an item inserted into the tree, a foreign key table of the item's attributes must be created for all but the most trivial of use cases.