Vantage-point tree

یک درخت نقطه برتری (یا درخت VP) یک درخت متریک است که با انتخاب یک موقعیت در فضا ("نقطه برتری") و تقسیم بندی داده ها، آنها را به دو بخش در فضای متریک تفکیک می کند: نقاطی که از حد آستانه مشخص شده به نقطه برتری نزدیک تر هستند و نقاطی که فاصله بیشتری از حد آستانه مشخص به نقطه برتری دارند. با اعمال بازگشتی این فرآیند برای تقسیم بندی فضا به مجموعه های کوچک و کوچکتر یک ساختار داده تشکیل می شود که داده های مجاور در درخت تشکیل شده احتمالا در فضای متریک هم همسایه هستند. ^[1]^[2]

یک تعمیم (بهبود) از این ساختار درخت با چند نقطه برتری (یا درخت mvp) است؛ که یک ساختار داده برای اندیس گذاری اشیا موجود در فضای متریک بزرگ برای کوئری جستجوی شباهت است. این ساختار از بیش از یک نقطه برتری در هر مرحله استفاده میکند.

History

Peter Yianilos claimed that the vantage-point tree was discovered independently by him (Peter Yianilos) and by Jeffrey Uhlmann.^[2] Yet, Uhlmann published this method before Yianilos in 1991.^[1] Uhlmann called the data structure a metric tree, the name VP-tree was proposed by Yianilos. Vantage-point trees have been generalized to non-metric spaces using Bregman divergences by Nielsen et al.^[3]

This iterative partitioning process is similar to that of a $k$ -d tree, but uses circular (or spherical, hyperspherical, etc.) rather than rectilinear partitions. In two-dimensional Euclidean space, this can be visualized as a series of circles segregating the data.

The vantage-point tree is particularly useful in dividing data in a non-standard metric space into a metric tree.

Understanding a vantage-point tree

The way a vantage-point tree stores data can be represented by a circle.^[4] First, understand that each node of this tree contains an input point and a radius. All the left children of a given node are the points inside the circle and all the right children of a given node are outside of the circle. The tree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the metric space.^[4]

Searching through a vantage-point tree

A vantage-point tree can be used to find the nearest neighbor of a point $x$ . The search algorithm is recursive. At any given step we are working with a node of the tree that has a vantage point $v$ and a threshold distance $t$ . The point of interest $x$ will be some distance from the vantage point $v$ . If that distance $d$ is less than $t$ then use the algorithm recursively to search the subtree of the node that contains the points closer to the vantage point than the threshold $t$ ; otherwise recurse to the subtree of the node that contains the points that are farther than the vantage point than the threshold $t$ . If the recursive use of the algorithm finds a neighboring point $n$ with distance to $x$ that is less than $| t - d |$ then it cannot help to search the other subtree of this node; the discovered node $n$ is returned. Otherwise, the other subtree also needs to be searched recursively.

A similar approach works for finding the $k$ nearest neighbors of a point $x$ . In the recursion, the other subtree is searched for $k - k'$ nearest neighbors of the point $x$ whenever only $k' (< k)$ of the nearest neighbors found so far have distance that is less than $| t - d |$ .

Advantages of a vantage-point tree

Instead of inferring multidimensional points for ___domain before the index being built, we build the index directly based on the distance.^[4] Doing this, avoids pre-processing steps.
Updating a vantage-point tree is relatively easy compared to the fast-map approach. For fast maps, after inserting or deleting data, there will come a time when fast-map will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start.
Distance based methods are flexible. It is “able to index objects that are represented as feature vectors of a fixed number of dimensions."^[4]

Complexity

The time cost to build a Vantage-Point tree is approximately $O (n log n)$ . For each element, the tree is descended by $log n$ levels to find its placement. However there is a constant factor $k$ where $k$ is the number of vantage points per tree node.^[5]

The time cost to search a Vantage-Point tree to find a single nearest neighbor is $O (log n)$ . There are $log n$ levels, each involving $k$ distance calculations, where $k$ is the number of vantage points (elements) at that position in the tree.

The time cost to search a Vantage-Point tree for a range, which may be the most important attribute, can vary greatly depending on the specifics of the algorithm used and parameters. Brin's paper ^[5] gives the result of experiments with several vantage point algorithms with various parameters to investigate the cost, measured in number of distance calculations.

The space cost for a Vantage-Point tree is approximately $n$ . Each element is stored, and each tree element in each non-leaf node requires a pointer to its descendant node. (See Brin for details on one implementation choice. The parameter for number of elements at each node plays a factor.)

Note that some metric space tools require a matrix of pair-wise distance values, costing $O (n 2)$ , but that is not required with Vantage-Point trees.

References

^ ^a ^b Uhlmann, Jeffrey (1991). "Satisfying General Proximity/Similarity Queries with Metric Trees". Information Processing Letters. 40 (4): 175–179. doi:10.1016/0020-0190(91)90074-r.
^ ^a ^b Yianilos (1993). Data structures and algorithms for nearest neighbor search in general metric spaces (PDF). Fourth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321. pny93. Retrieved 2008-08-22. {{cite conference}}: Cite has empty unknown parameter: |booktitle= (help)
^ Nielsen, Frank (2009). "Bregman vantage point trees for efficient nearest Neighbor Queries". Proceedings of Multimedia and Exp (ICME). IEEE. pp. 878–881. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
^ ^a ^b ^c ^d Fu, Ada Wai-chee; Polly Mei-shuen Chan; Yin-Ling Cheung; Yiu Sang Moon (2000). "Dynamic vp-tree indexing for $n$ -nearest neighbor search given pair-wise distances". The VLDB Journal — The International Journal on Very Large Data Bases. Springer-Verlag New York, Inc. Secaucus, NJ, USA. pp. 154–173. vp. Retrieved 2012-10-02. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)
^ ^a ^b Brin, Sergey (Sep 1995). "Near Neighbor Search in Large Metric Spaces". VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases. Zurich, Switzerland: Morgan Kaufmann Publishers Inc.: 574–584.

External links

Understanding VP Trees

[:0-1] Uhlmann, Jeffrey (1991). "Satisfying General Proximity/Similarity Queries with Metric Trees". Information Processing Letters. 40 (4): 175–179. doi:10.1016/0020-0190(91)90074-r.

[pny93-2] Yianilos (1993). Data structures and algorithms for nearest neighbor search in general metric spaces (PDF). Fourth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321. pny93. Retrieved 2008-08-22. {{cite conference}}: Cite has empty unknown parameter: |booktitle= (help)

[nielsen2009-3] Nielsen, Frank (2009). "Bregman vantage point trees for efficient nearest Neighbor Queries". Proceedings of Multimedia and Exp (ICME). IEEE. pp. 878–881. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

[vp-4] Fu, Ada Wai-chee; Polly Mei-shuen Chan; Yin-Ling Cheung; Yiu Sang Moon (2000). "Dynamic vp-tree indexing for $n$ -nearest neighbor search given pair-wise distances". The VLDB Journal — The International Journal on Very Large Data Bases. Springer-Verlag New York, Inc. Secaucus, NJ, USA. pp. 154–173. vp. Retrieved 2012-10-02. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

[brin-5] Brin, Sergey (Sep 1995). "Near Neighbor Search in Large Metric Spaces". VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases. Zurich, Switzerland: Morgan Kaufmann Publishers Inc.: 574–584.

[1]

[2]

[3]

[4]

[5]