Multidimensional scaling: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 10:03, 28 May 2023 edit LogicOwl (talk \| contribs) 1 edit m Fixed link to article on JSTOR. ← Previous edit		Latest revision as of 02:46, 17 April 2025 edit undo Citation bot (talk \| contribs) Bots 5,861,020 edits Altered pages. Add: chapter-url, doi, authors 1-1. Removed or converted URL. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. Upgrade ISBN10 to 13. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Linked from User:LinguisticMystic/cs/outline \| #UCB_webform_linked 1368/2277
(24 intermediate revisions by 15 users not shown)
Line 1: {{Short description\|Set of related ordination techniques used in information visualization}} [[File:RecentVotes.svg\|thumb\|400px\|An example of classical multidimensional scaling applied to voting patterns in the [[United States House of Representatives]]. Each ~~red~~blue dot represents one ~~Republican~~Democratic member of the House, and each ~~blue~~red dot one ~~Democrat~~Republican.]] '''Multidimensional scaling''' ('''MDS''') is a means of visualizing the level of [[Similarity measure\|similarity]] of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of <math display="inline"> n </math> objects or individuals" into a configuration of <math display="inline"> n </math> points mapped into an abstract [[Cartesian coordinate system\|Cartesian space]].<ref name="MS_history">{{cite journal \|last= Mead\|first=A \|date= 1992\|title= Review of the Development of Multidimensional Scaling Methods \|journal= Journal of the Royal Statistical Society. Series D (The Statistician)\|volume= 41\|issue=1 \|pages=27–39 \|quote= Abstract. Multidimensional scaling methods are now a common statistical tool in psychophysics and sensory analysis. The development of these methods is charted, from the original research of Torgerson (metric scaling), Shepard and Kruskal (non-metric scaling) through individual differences scaling and the maximum likelihood methods proposed by Ramsay. \|jstor=2348634 }}</ref>▼ {{Data Visualization}} ▲'''Multidimensional scaling''' ('''MDS''') is a means of visualizing the level of [[Similarity measure\|similarity]] of individual cases of a ~~dataset~~data set. MDS is used to translate ~~"information about the pairwise '~~distances' ~~among~~between aeach ~~set~~pair of <math display="inline"> n </math> objects orin ~~individuals"~~a set into a configuration of <math display="inline"> n </math> points mapped into an abstract [[Cartesian coordinate system\|Cartesian space]].<ref name="MS_history">{{cite journal \|last= Mead\|first=A \|date= 1992\|title= Review of the Development of Multidimensional Scaling Methods \|journal= Journal of the Royal Statistical Society. Series D (The Statistician)\|volume= 41\|issue=1 \|pages=27–39 \|doi=10.2307/2348634 \|quote= Abstract. Multidimensional scaling methods are now a common statistical tool in psychophysics and sensory analysis. The development of these methods is charted, from the original research of Torgerson (metric scaling), Shepard and Kruskal (non-metric scaling) through individual differences scaling and the maximum likelihood methods proposed by Ramsay. \|jstor=2348634 }}</ref> More technically, MDS refers to a set of related [[Ordination (statistics)\|ordination]] techniques used in [[information visualization]], in particular to display the information contained in a [[distance matrix]]. It is a form of [[non-linear dimensionality reduction]]. Line 33 ⟶ 36: It is also known as '''Principal Coordinates Analysis''' (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a [[loss function]] called ''strain'',<ref name="borg"/> which is given by <math display=block>\text{Strain}_D(x_1,x_2,...,~~x_N~~x_n)=\Biggl(\frac{ \sum_{i,j} \bigl( b_{ij} - x_i^T x_j \bigr)^2}{\sum_{i,j}b_{ij}^2} \Biggr)^{1/2},</math> where <math>x_{i}</math> denote vectors in ''N''-dimensional space, <math>x_i^T x_j </math> denotes the scalar product between <math>x_{i}</math> and <math>x_{j}</math>, and <math>b_{ij}</math> are the elements of the matrix <math>B</math> defined on step 2 of the following algorithm, which are computed from the distances. Line 42 ⟶ 45: :# Determine the <math display="inline">m</math> largest [[Eigenvalues and eigenvectors\|eigenvalues]] <math display="inline">\lambda_1,\lambda_2,...,\lambda_m</math> and corresponding [[Eigenvalues and eigenvectors\|eigenvectors]] <math display="inline">e_1,e_2,...,e_m</math> of <math display="inline">B</math> (where <math display="inline">m</math> is the number of dimensions desired for the output). :# Now, <math display="inline">X=E_m\Lambda_m^{1/2}</math> , where <math display="inline">E_m</math> is the matrix of <math display="inline">m</math> eigenvectors and <math display="inline">\Lambda_m</math> is the [[diagonal matrix]] of <math display="inline">m</math> eigenvalues of <math display="inline">B</math>. :Classical MDS assumes ~~[[Euclidean distance\|Euclidean]]~~metric distances. So this is not applicable for direct dissimilarity ratings. === Metric multidimensional scaling (mMDS) === It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called ''stress'', which is often minimized using a procedure called [[stress majorization]]. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares:<blockquote><math>\text{Stress}_D(x_1,x_2,...,~~x_N~~x_n)=\sqrt{\sum_{i\ne j=1,...,Nn}\bigl(d_{ij}-\\|x_i-x_j\\|\bigr)^2}.</math></blockquote> Metric scaling uses a power transformation with a user-controlled exponent <math display="inline">p</math>: <math display="inline">d_{ij}^p</math> and <math display="inline">-d_{ij}^{2p}</math> for distance. In classical scaling <math display="inline">p=1.</math> Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities. ===Non-metric multidimensional scaling (NMDS)=== In contrast to metric MDS, non-metric MDS finds both a [[non-parametric]] [[monotonic]] relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the ___location of each item in the low-dimensional space. The relationship is typically found using [[isotonic regression]]: let <math display="inline">x</math> denote the vector of proximities, <math display="inline">f(x)</math> a monotonic transformation of <math display="inline">x</math>, and <math display="inline">d</math> the point distances; then coordinates have to be found, that minimize the so-called stress, :<blockquote><math>\text{Stress}=\sqrt{\frac{\sum\bigl(f(x)-d\bigr)^2}{\sum d^2}}.</math></blockquote>▼ Let <math>d_{ij}</math> be the dissimilarity between points <math>i, j</math>. Let <math>\hat d_{ij} = \\| x_i - x_j\\|</math> be the Euclidean distance between embedded points <math>x_i, x_j</math>. Now, for each choice of the embedded points <math>x_i</math> and is a monotonically increasing function <math>f</math>, define the "stress" function: ▲:<blockquote><math>~~\text{Stress}~~S(x_1, ..., x_n; f)=\sqrt{\frac{\~~sum~~sum_{i<j}\bigl(f(xd_{ij})-d\hat d_{ij}\bigr)^2}{\~~sum~~sum_{i<j} d\hat d_{ij}^2}}.</math></blockquote> The factor of <math>\sum_{i<j} \hat d_{ij}^2</math> in the denominator is necessary to prevent a "collapse". Suppose we define instead <math>S=\sqrt{\sum_{i<j}\bigl(f(d_{ij})-\hat d_{ij})^2}</math>, then it can be trivially minimized by setting <math>f = 0</math>, then collapse every point to the same point. A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution. The core of a non-metric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible. ~~The basic steps in a non-metric MDS algorithm are:~~ :# Find a random configuration of points, e. g. by sampling from a normal distribution. ▼ NMDS needs to optimize two objectives simultaneously. This is usually done iteratively: ~~:# Calculate the distances d between the points.~~ ▲:# ~~Find~~Initialize a<math>x_i</math> ~~random configuration of points~~randomly, e. g. by sampling from a normal distribution. ~~:# Find the optimal monotonic transformation of the proximities, in order to obtain optimally scaled data <math display="inline">f(x)</math>.~~ :# Do until a stopping criterion (for example, <math>S < \epsilon</math>) ~~:# Minimize the stress between the optimally scaled data and the distances by finding a new configuration of points.~~ :## Solve for <math>f = \arg\min_f S(x_1, ..., x_n ; f)</math> by [[isotonic regression]]. ~~:# Compare the stress to some criterion. If the stress is small enough then exit the algorithm else return to 2.~~ :## Solve for <math>x_1, ..., x_n = \arg\min_{x_1, ..., x_n} S(x_1, ..., x_n ; f)</math> by gradient descent or other methods. :#Return <math>x_i</math> and <math>f</math> [[Louis Guttman]]'s smallest space analysis (SSA) is an example of a non-metric MDS procedure. Line 66 ⟶ 77: {{main\|Generalized multidimensional scaling}} An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.<ref name="bron">{{cite journal \|vauthors=Bronstein AM, Bronstein MM, Kimmel R \|title=Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching \|journal=Proc. Natl. Acad. Sci. U.S.A. \|volume=103 \|issue=5 \|pages=1168–72 \|date=January 2006 \|pmid=16432211 \|pmc=1360551 \|doi=10.1073/pnas.0508601103 \|bibcode=2006PNAS..103.1168B \|doi-access=free }}</ref> === Super multidimensional scaling (SMDS) === An extension of MDS, known as Super MDS, incorporates both distance and angle information for improved source localization. Unlike traditional MDS, which uses only distance measurements, Super MDS processes both distance and angle-of-arrival (AOA) data algebraically (without iteration) to achieve better accuracy.<ref>{{cite conference \|last1=de Abreu \|first1=G. T. F. \|last2=Destino \|first2=G. \|title=Super MDS: Source Location from Distance and Angle Information \|conference=2007 IEEE Wireless Communications and Networking Conference \|___location=Hong Kong, China \|pages=4430–4434 \|year=2007 \|doi=10.1109/WCNC.2007.807}}</ref> The method proceeds in the following steps: # '''Construct the Reduced Edge Gram Kernel:''' For a network of <math>N</math> sources in an <math>\eta</math>-dimensional space, define the edge vectors as <math>v_{i} = x_{m} - x_{n}</math>. The dissimilarity is given by <math>k_{i,j} = \langle v_i, v_j \rangle</math>. Assemble these into the full kernel <math>K = VV^T</math>, and then form the reduced kernel using the <math>N-1</math> independent vectors: <math>\bar{K} = [V]_{(N-1)\times\eta}\ [V]_{(N-1)\times\eta}^T</math>, # '''Eigen-Decomposition:''' Compute the eigen-decomposition of <math>\bar{K}</math>, # '''Estimate Edge Vectors:''' Recover the edge vectors as <math> \hat{V} = \Bigl( U_{M \times \eta}\, \Lambda^{\odot \frac{1}{2}}_{\eta \times \eta} \Bigr)^T </math>, # '''Procrustes Alignment:''' Retrieve <math>\hat{V}</math> from <math>V</math> via Procrustes Transformation, # '''Compute Coordinates:''' Solve the following linear equations to compute the coordinate estimates <math>\begin{pmatrix} 1 \vline \mathbf{0}_{1 \times N-1} \\ \hline \mathbf{[C]}_{N-1 \times N} \end{pmatrix} \cdot \begin{pmatrix}\mathbf{x}_{1} \\ \hline[\mathbf{X}]_{N-1 \times \eta} \end{pmatrix}=\begin{pmatrix} \mathbf{x}_{1} \\ \hline[\mathbf{V}]_{N-1 \times \eta} \end{pmatrix}, </math> This concise approach reduces the need for multiple anchors and enhances localization precision by leveraging angle constraints. ==Details== Line 89 ⟶ 124: :<math>\\|x_i - x_j\\| \approx d_{i,j}</math> for all <math>i,j\in {1,\dots,M}</math>, where <math>\\|\cdot\\|</math> is a [[Norm (mathematics)\|vector norm]]. In classical MDS, this norm is the [[Euclidean distance]], but, in a broader sense, it may be a [[metric (mathematics)\|metric]] or arbitrary distance function.<ref name="Kruskal">[[Joseph Kruskal\|Kruskal, J. B.]], and Wish, M. (1978), ''Multidimensional Scaling'', Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications.</ref> For example, when dealing with mixed-type data that contain numerical as well as categorical descriptors, [[Gower's distance]] is a common alternative.{{cn\|date=June 2024}} In other words, MDS attempts to find a mapping from the <math>M</math> objects into <math>\mathbb{R}^N</math> such that distances are preserved. If the dimension <math>N</math> is chosen to be 2 or 3, we may plot the vectors <math>x_i</math> to obtain a visualization of the similarities between the <math>M</math> objects. Note that the vectors <math>x_i</math> are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances <math>\\|x_i - x_j\\|</math>. Line 114 ⟶ 149: * [[ELKI]] includes two MDS implementations. * [[MATLAB]] includes two MDS implementations (for classical (''cmdscale'') and non-classical (''mdscale'') MDS respectively). * The [[R (programming language)\|R programming language]] offers several MDS implementations, e.g. base ''cmdscale'' function, packages [https://CRAN.R-project.org/package=smacof smacof]<ref>{{Cite journal\|~~last~~last1=Leeuw\|~~first~~first1=Jan de\|last2=Mair\|first2=Patrick\|date=2009\|title=Multidimensional Scaling Using Majorization: SMACOF in R\|url=http://www.jstatsoft.org/v31/i03/\|journal=Journal of Statistical Software\|language=en\|volume=31\|issue=3\|doi=10.18637/jss.v031.i03\|issn=1548-7660\|doi-access=free}}</ref> (mMDS and nMDS), and [https://CRAN.R-project.org/package=vegan vegan] (weighted MDS). * [[scikit-learn]] contains function [http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html sklearn.manifold.MDS]. Line 120 ⟶ 155: {{Commons category\|Multidimensional scaling}} * [[Data clustering]] * [[t-distributed stochastic neighbor embedding]] * [[Factor analysis]] * [[Discriminant analysis]] Line 133 ⟶ 169: == Bibliography == {{refbegin}} * {{cite book \|last1=Cox \|first1=T.F. \|last2=Cox \|first2=M.A.A. \|editor1-last=Unwin \|editor1-first=A \|editor2-last=Chen \|editor2-first=C \|editor3-last=Hardle \|editor3-first=W. K. \|title=~~Multidimensional~~Handbook ~~Scaling~~of Data Visualization \|date=2001 \|publisher=~~Chapman~~Springer ~~and~~\|isbn=978-3-540-33037-0 ~~Hall~~\|chapter-url=https://doi.org/10.1007/978-3-540-33037-0_14 \|~~year~~chapter=~~2001~~Multidimensional Scaling\|doi=10.1007/978-3-540-33037-0_14 }} * {{cite book \|author=Coxon, Anthony P.M. \|title=The User's Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs \|publisher=Heinemann Educational Books \|___location=London \|year=1982 }} * {{cite journal \|author=Green, P. \|title=Marketing applications of MDS: Assessment and outlook \|journal=Journal of Marketing \|volume=39 \|pages=24–31 \|date=January 1975 \|doi=10.2307/1250799 \|issue=1 \|jstor=1250799 }}