Content deleted Content added
link died |
Undid revision 1121154757 by AManWithNoPlan (talk). Please do not simply delete dead links |
||
Line 157:
[[Feature extraction]] and dimension reduction can be combined in one step using [[Principal Component Analysis|principal component analysis]] (PCA), [[linear discriminant analysis]] (LDA), or [[Canonical correlation|canonical correlation analysis]] (CCA) techniques as a pre-processing step, followed by clustering by ''k''-NN on [[Feature (machine learning)|feature vectors]] in reduced-dimension space. This process is also called low-dimensional [[embedding]].<ref>{{citation |last1=Shaw |first1=Blake |last2=Jebara |first2=Tony |title=Structure preserving embedding |work=Proceedings of the 26th Annual International Conference on Machine Learning |year=2009 |pages=1–8 | publication-date=June 2009 |url=http://www.cs.columbia.edu/~jebara/papers/spe-icml09.pdf |doi=10.1145/1553374.1553494 |isbn=9781605585161 |s2cid=8522279 }}</ref>
For very-high-dimensional datasets (e.g. when performing a similarity search on live video streams, DNA data or high-dimensional [[time series]]) running a fast '''approximate''' ''k''-NN search using [[Locality Sensitive Hashing|locality sensitive hashing]], "random projections",<ref>Bingham, Ella; and Mannila, Heikki; [https://citeseerx.ist.psu.edu/doc_view/pid/aed77346f737b0ed5890b61ad02e5eb4ab2f3dc6 "Random projection in dimensionality reduction: applications to image and text data"], in ''Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining'', ACM, 2001</ref> "sketches" <ref>Ryan, Donna (editor); ''High Performance Discovery in Time Series'', Berlin: Springer, 2004, {{ISBN|0-387-00857-8}}</ref> or other high-dimensional similarity search techniques from the [[VLDB conference|VLDB]] toolbox might be the only feasible option.
== Decision boundary ==
|