Nested set model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:10, 27 September 2016 edit 92.88.237.125 (talk) The linked reference was no longer available ← Previous edit		Latest revision as of 12:00, 27 July 2024 edit undo FoeNyx (talk \| contribs) 498 edits m →Motivation: + wikilink
(37 intermediate revisions by 31 users not shown)
Line 1: {{Short description\|Technique used in relational databases}} The '''nested set model''' is a ~~particular~~ technique for representing [[nested set collection]]s (also known as [[tree (data structure)\|tree]]s or [[hierarchy\|hierarchies]]) in [[relational database]]s. The term was apparently introduced by [[Joe Celko]]; others describe the same technique using different terms.<ref>[http://articles.sitepoint.com/article/hierarchical-data-database/2 Storing Hierarchical Data in a Database: ''Modified Pre-order Tree Traversal''], by Gijs van Tulder, at articles.sitepoint.com</ref> It is based on Nested Intervals, that "are immune to hierarchy reorganization problem, and allow answering ancestor path hierarchical queries algorithmically — without accessing the stored hierarchy relation".<ref>"Nested Intervals Tree Encoding in SQL", Vadim Tropashko; Oracle Corp. Original at https://web.archive.org/web/20111119165033/http://sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf</ref> == Motivation ==▼ The technique is an answer to the problem that the standard [[relational algebra]] and [[relational calculus]], and the [[SQL]] operations based on them, are unable to express all desirable operations on hierarchies directly. A hierarchy can be expressed in terms of a parent-child relation - Celko calls this the [[adjacency list model]] - but if it can have arbitrary depth, this does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one [[Join (relational_algebra)#Joins_and_join-like_operators\|relational join]] per level. This is often known as the [[bill of materials]] problem.{{citation needed\|date=November 2011}}▼ ▲== Motivation == Hierarchies may be expressed easily by switching to a [[graph database]]. Alternatively, several resolutions exist for the relational model and are available as a workaround in some [[relational database management system]]s:▼ The standard [[relational algebra]] and [[relational calculus]], and the [[SQL]] operations based on them, are unable to express directly all desirable operations on hierarchies. The nested set model is a solution to that problem. ▲~~The~~An ~~technique~~alternative solution is ~~an answer to~~ the ~~problem~~expression ~~that~~of the ~~standard [[relational algebra]] and [[relational calculus]], and the [[SQL]] operations based on them, are unable to express all desirable operations on hierarchies directly. A~~ hierarchy ~~can be expressed in terms of~~as a parent-child relation. -[[Joe Celko]] ~~calls~~called this the [[adjacency list model]]. -If ~~but~~the ~~if it~~hierarchy can have arbitrary depth, ~~this~~the adjacency list model does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one [[Join (~~relational_algebra~~relational algebra)#~~Joins_and_join~~Joins and join-~~like_operators~~like operators\|relational join]] per level. This is often known as the [[bill of materials]] problem.~~{{citation needed\|date=November 2011}}~~ ▲Hierarchies may be expressed easily by switching to a [[graph database]]. Alternatively, several resolutions exist for the relational model and are available as a workaround in some [[relational database management system]]s: * support for a dedicated [[hierarchy data type]], such as in SQL's [[hierarchical query]] facility; Line 14 ⟶ 18: When these solutions are not available or not feasible, another approach must be taken. == ~~The technique~~ Technique== The ~~'''~~nested set model~~'''~~ is to number the nodes according to a [[tree traversal]], which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements that use [[rational number]]s instead of integers can avoid renumbering, and so are faster to update, although much more complicated.<ref>{{cite arXiv \|eprint=0806.3115 \| first= Daniel\|last = Hazel \| title = Using rational numbers to key nested sets\| year= 2008\| class= cs.DB}}</ref>▼ ▲The '''nested set model''' is to number the nodes according to a [[tree traversal]], which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements that use [[rational number]]s instead of integers can avoid renumbering, and so are faster to update, although much more complicated.<ref>{{cite arXiv \|eprint=0806.3115 \| first= Daniel\|last = Hazel \| title = Using rational numbers to key nested sets}}</ref> == Example ==▼ ▲== Example == In a clothing store catalog, clothing may be categorized according to the hierarchy given on the left: Line 55 ⟶ 57: ==Performance== Queries using nested sets can be expected to be faster than queries using a [[stored procedure]] to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as [[MySQL]] 5.x.<ref>{{Citation▼ ▲Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as [[MySQL]].<ref>{{Citation \| title= Adjacency list vs. nested sets: MySQL \| author= Quassnoi \| date = 29 September 2009 \| periodical = Explain Extended \| url = ~~http~~https://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/ \| accessdate = 11 December 2010 }}</ref> However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as [[PostgreSQL]],<ref>{{Citation Line 84 ⟶ 85: \| url = http://explainextended.com/2009/09/25/adjacency-list-vs-nested-sets-sql-server/ \| accessdate = 11 December 2010 }}</ref> [[MySQL]] used to lack recursive query constructs but added such features in version 8.<ref>{{Cite web\|title=MySQL :: MySQL 8.0 Reference Manual :: 13.2.15 WITH (Common Table Expressions)\|url=https://dev.mysql.com/doc/refman/8.0/en/with.html\|access-date=2021-09-01\|website=dev.mysql.com}}</ref> ~~}}</ref>~~ ==Drawbacks== The use case for a dynamic endless database tree hierarchy is rare. The Nested Set model is appropriate where the tree element and one or two attributes are the only data, but is a poor choice when more complex relational data exists for the elements in the tree. Given an arbitrary starting depth for a category of 'Vehicles' and a child of 'Cars' with a child of 'Mercedes', a foreign key table relationship must be established unless the tree table is natively non-normalized. Attributes of a newly created tree item may not share all attributes with a parent, child or even a sibling. If a foreign key table is established for a table of 'Plants' attributes, no integrity is given to the child attribute data of 'Trees' and its child 'Oak'. Therefore, in each case of an item inserted into the tree, a foreign key table of the item's attributes must be created for all but the most trivial of use cases. If the tree isn't expected to change often, a properly normalized hierarchy of attribute tables can be created in the initial design of a system, leading to simpler, more portable SQL statements; specifically ones that don't require an arbitrary number of runtime, programmatically created or deleted tables for changes to the tree. For more complex systems, hierarchy can be developed through relational models rather than an implicit numeric tree structure. Depth of an item is simply another attribute rather than the basis for an entire DB architecture. As stated in ''SQL Antipatterns'':<ref>{{cite book\|last1=Bill\|first1=Karwin\|title=SQL Antipatterns\|date=2010-06-17\|pages=328\|url=https://pragprog.com/book/bksqla/sql-antipatterns}}</ref> Nested sets are very slow for inserts because it requires updating left and right ___domain values for all records in the table after the insert. This can cause a lot of database thrash{{Citation needed\|date=August 2012}} as many rows are rewritten and indexes rebuilt. However, if it is possible to store a forest of small trees in table instead of a single big tree, the overhead may be significantly reduced, since only one small tree must be updated.▼ <blockquote>Nested Sets is a clever solution – maybe too clever. It also fails to support referential integrity. It’s best used when you need to query a tree more frequently than you need to modify the tree.<ref>{{cite book\|last1=Bill\|first1=Karwin\|title=SQL Antipatterns\|page=44}}</ref></blockquote> The [[Nested intervals\|nested interval model]] does not suffer from this problem, but is more complex to implement, and is not as well known. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). [http://www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf]▼ The model doesn't allow for multiple parent categories. For example, an 'Oak' could be a child of 'Tree-Type', but also 'Wood-Type'. An additional tagging or taxonomy has to be established to accommodate this, again leading to a design more complex than a straightforward fixed model. ▲Nested sets are very slow for inserts because it requires updating left and right ___domain values for all records in the table after the insert. This can cause a lot of database ~~thrash{{Citation needed\|date=August 2012}}~~stress as many rows are rewritten and indexes rebuilt. However, if it is possible to store a forest of small trees in table instead of a single big tree, the overhead may be significantly reduced, since only one small tree must be updated. ▲The [[Nested intervals\|nested interval model]] does not suffer from this problem, but is more complex to implement, and is not as well known. It still suffers from the relational foreign-key table problem. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). [http://www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf] ==Variations== Using the nested set model as described above has some performance limitations during certain tree traversal operations. For example, trying to find the immediate child nodes given a parent node requires pruning the subtree to a specific level as in the following [[SQL]] code example: <~~source~~syntaxhighlight lang="sql"> SELECT Child.Node, Child.Left, Child.Right FROM Tree as Parent, Tree as Child Line 105 ⟶ 113: WHERE Mid.Left BETWEEN Parent.Left AND Parent.Right AND Child.Left BETWEEN Mid.Left AND Mid.Right AND Mid.Node NOT IN (Parent.Node ~~AND~~, Child.Node) ) AND Parent.Left = 1 -- Given Parent Node Left Index </syntaxhighlight> ~~</source>~~ Or, equivalently: <~~source~~syntaxhighlight lang="sql"> SELECT DISTINCT Child.Node, Child.Left, Child.Right FROM Tree as Child, Tree as Parent Line 116 ⟶ 124: GROUP BY Child.Node, Child.Left, Child.Right HAVING max(Parent.Left) = 1 -- Subset for those with the given Parent Node as the nearest ancestor </syntaxhighlight> ~~</source>~~ The query will be more complicated when searching for children more than one level deep. To overcome this limitation and simplify [[tree traversal]] an additional column is added to the model to maintain the depth of a node within a tree. {\| class="wikitable sortable" Line 149 ⟶ 157: In this model, finding the immediate children given a parent node can be accomplished with the following [[SQL]] code: <~~source~~syntaxhighlight lang="sql"> SELECT Child.Node, Child.Left, Child.Right FROM Tree as Child, Tree as Parent Line 156 ⟶ 164: AND Child.Left > Parent.Left AND Child.Right < Parent.Right AND Parent.~~Left~~Depth = 1 -- Given Parent Node Left Index </syntaxhighlight> ~~</source>~~ ==See also== * [[~~Tree~~Adjacency ~~traversal~~list]] * [[Calkin–Wilf tree]] [[Tree (data structure)]]▼ [[Tree traversal]] ▲* [[Tree (data structure)]] ==References== Line 169 ⟶ 179: [http://troels.arvin.dk/db/rdbms/links/#hierarchical Troels' links to Hierarchical data in RDBMSs] [http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ Managing hierarchical data in relational databases] [http://pear.php.net/package/DB_NestedSet PHP PEAR Implementation for Nested Sets] -– by Daniel Khan [http://devmd.com/r/adjacency-list-to-nested-sets-mysql Transform any Adjacency List to Nested Sets using MySQL stored procedures] [https://github.com/previousnext/nested-set PHP Doctrine DBAL implementation for Nested Sets] – by PreviousNext [https://github.com/Vince0931/NestedSet R Nested Set] – Nested Set example in R {{DEFAULTSORT:Nested Set Model}} [[Category:Database theory]]