Multiple factor analysis: Difference between revisions

Content deleted Content added
No edit summary
m clean up spacing around commas and other punctuation fixes, replaced: ; → ;
 
(48 intermediate revisions by 26 users not shown)
Line 1:
'''Multiple factor analysis (MFA)''' is a [[Factorial experiment|factorial]] method<ref name="GreenacreBlasius2006">{{cite book|last1=Greenacre|first1=Michael|last2=Blasius|first2=Jorg|author-link2=Jörg Blasius|title=Multiple Correspondence Analysis and Related Methods|url=https://books.google.com/books?id=ZvYV1lfU5zIC&pg=PA352|accessdate=11 June 2014|date=2006-06-23|publisher=CRC Press|isbn=9781420011319|pages=352–}}</ref> devoted to the study of tables in which a group of individuals is described by a set of variables (quantitative and / or qualitative) structured in groups. It is a [[Multivariate statistics|multivariate method]] from the field of [[Ordination (statistics)|ordination]] used to simplify [[Dimensionality reduction|multidimensional data]] structures. MFA treats all involved tables in the same way (symmetrical analysis). It may be seen as an extension of:
{{Userspace draft|source=ArticleWizard|date=May 2014}}
• the* [[Principal component analysis]] (PCA) when variables are quantitative,
• the* [[Multiple correspondence analysis]] (MCA) when variables are qualitative,
* the[[Factor Factorial Analysisanalysis of Mixedmixed Datadata]] (FAMD) when the active variables belong to the two types.
 
== Introductory Exampleexample ==
'''Multiple Factor Analysis (MFA)''' new article content ...
Introduction
The Multiple Factor Analysis is a factorial method devoted to the study of tables in which a group of individuals is described by a set of variables (quantitative and / or qualitative) structured in groups. It may be seen as an extension of:
• the [[Principal component analysis]] (PCA) when variables are quantitative,
• the [[Multiple correspondence analysis]] (MCA) when variables are qualitative,
• the Factorial Analysis of Mixed Data (FAMD) when the active variables belong to the two types.
 
Why introduce several active groups of variables active in the same factorial analysis?
==Introductory Example ==
 
'' Datadata''
Why introduce several groups of variables active in the same factorial analysis?
 
Let us considerConsider the case of quantitative variables, that is to say, within the framework of the PCA. An example of data from ecological research provides a useful illustration. There are, for 72 stations, two types of measurements.
'' Data''
There are, for 72 stations, two types of measurements:
 
Let us consider the case of quantitative variables, that is to say, within the framework of the PCA. An example of data from ecological research provides a useful illustration. There are, for 72 stations, two types of measurements.
# The abundance-dominance coefficient of 50 plant species (coefficient ranging from 0 = the plant is absent, to 9 = the species covers more than three-quarters of the surface). The whole set of the 50 coefficients defines the floristic profile of a station.
# Eleven pedological measurements ([[Pedology]] = soil science): particle size, physical, chemistry, etc. The set of these eleven measures defines the pedological profile of a station.
 
'' Three possible analyses'' are possible:
This# PCA of flora (pedology as supplementary): this analysis focuses on the variability of the floristic profiles. Two stations are close one another if they have similar floristic profiles. In a second step, the main dimensions of this variability (i.e. the principal components) are related to the pedological variables introduced as supplementary.
This# PCA of pedology (flora as supplementary): this analysis focuses on the variability of soil profiles. Two stations are close if they have the same soil profile. The main dimensions of this variability (i.e. the principal components) are then related to the abundance of plants.
One# PCA of the two groups of variables as active: one may want to study the variability of stations from both the point of view of flora and soil. In this approach, two stations should be close if they have both similar flora'' 'and''' similar soils.
 
'' PCA of flora (pedology as supplementary)''
This analysis focuses on the variability of the floristic profiles. Two stations are close one another if they have similar floristic profiles. In a second step, the main dimensions of this variability (i.e. the principal components) are related to the pedological variables introduced as supplementary.
 
'' PCA of pedology (flora as supplementary)''
This analysis focuses on the variability of soil profiles. Two stations are close if they have the same soil profile. The main dimensions of this variability (i.e. the principal components) are then related to the abundance of plants.
 
'' PCA of the two groups of variables as active''
One may want to study the variability of stations from both the point of view of flora and soil. In this approach, two stations should be close if they have both similar flora'' 'and''' similar soils.
== Balance between groups of variables ==
 
=== Methodology ===
 
The third analysis of the introductory example implicitly assumes a balance between flora and soil. However, in this example, the mere fact that the flora is represented by 50 variables and the soil by 11 variables implies that the PCA with 61 active variables will be influenced mainly by the flora at least on the first axis). This is not desirable: there is no reason to wish one group play a more important role in the analysis.
 
The core of MFA is based on a factorial analysis (PCA in the case of quantitative variables, MCA in the case of qualitative variables) in which the variables are weighted. These weights are identical for the variables of the same group (and vary from one group to another). They are such that the maximum axial inertia of a group is equal to 1: in other words, by applying the PCA (or, where applicable, the MCA) to one group with this weighting, we obtain a first eigenvalue equal to 1. To get this property, MFA assigns to each variable of group <math>j</math> a weight equal to the inverse of the first eigenvalue of the analysis (PCA or MCA according to the type of variable) of the group <math>j</math>.
 
Formally, noting <math>\lambda_1^j </math> the first eigenvalue of the factorial analysis of one group <math>j</math>, the MFA assigns weight <math>1/\lambda_1^j </math> for each variable of the group <math>j</math>.
 
Balancing maximum axial inertia rather than the total inertia (= the number of variables in standard PCA) gives the MFA several important properties for the user. More directly, its interest appears in the following example.
 
=== Example ===
 
Let two groups of variables defined on the same set of individuals.
# The groupGroup 1 is composed of two uncorrelated variables A and B.
# The groupGroup 2 is composed of two variables {C1, C2} identical to the same variable C uncorrelated with the first two.
 
This example is not completely unrealistic. It is often necessary to simultaneously analyse multi-dimensional and (quite) one-dimensional groups.
Line 50 ⟶ 45:
 
'' Numerical Example ''
 
{| width=100% border="0"
|-
| width=50% |
 
{| class="wikitable centre" width="60%"
|+ Table 1. MFA. Test data. A etand B (group 1) are uncorrelated. C1 and C2 (group 2) are identical.
|-
! !! <math>A</math> !! <math>B</math> !! <math>C_1</math>!! <math>C_2</math>
Line 80 ⟶ 76:
| width=50% |
{| class="wikitable centre" width="60%"
|+ Table 2. Test data. Decomposition of the inertia in the PCA and in the MFA applied to data in Table 1.
|-
! !! <math>F_1</math> !! <math>F_2</math>
Line 89 ⟶ 85:
| 2.14 (100%) || 1
|- align="center"
!scope="row"| group 1
| 0.24(11%) || 1
|- align="center"
Line 110 ⟶ 106:
Table 2 summarizes the inertia of the first two axes of the PCA and of the MFA applied to Table 1.
 
Group 2 variables contribute to 88.95 % of the inertia of the axis 1 of the PCA. The first axis (<math>F_1</math>) is almost coincident with C: the correlation between C and <math>F_1</math> is .976;
 
The first axis of the MFA (on Table 1 data) shows the balance between the two groups of variables: the contribution of each group to the inertia of this axis is strictly equal to 50%.
 
The second axis, meanwhile, depends only on the group 1. This is natural since this group is two-dimensional while the second group, being one-dimensional, can be highly related to only one axis (here the first axis).
 
=== Conclusion about the balance between groups ===
Line 122 ⟶ 118:
This balance must take into account that a multidimensional group influences naturally more axes than a one-dimensional group does (which may not be closely related to one axis).
 
The weighting of the MFA, which makes equal to 1 the maximum axial inertia of each group equal to 1, plays this role.
 
== Application examples ==
== Overview on few application areas ==
 
''Survey''
Questionnaires are always structured according to different themes. Each theme is a group of variables, for example, questions about opinions and questions about behaviour. Thus, in this example, we may want to perform a factorial analysis in which two individuals are close if they have expressed both expressed the same opinions and the same behaviour.
 
''Sensory analysis ''
A same set of products has been evaluated by a panel of experts and a panel of consumers. For its evaluation, each jury uses a list of descriptors (sour, bitter, etc.). Each judge scores each descriptor for each product on a scale of intensity ranging for example from 0 = null or very low to 10 = very strong. In the table associated with a jury, at the intersection of the row <math>i</math> and column <math>k</math>, is the average score assigned to product <math>i</math> for descriptor <math>k</math>.
 
Individuals are the products. Each jury is a group of variables. We want to achieve a factorial analysis in which two products are similar if they were evaluated in the same way and that by both juries.
 
''Multidimensional time series''
<math>K</math> variables are measured on <math>I</math> individuals. These measurements are made at <math>J</math> dates. There are many ways to analyse such data set. One of themway suggested by the MFA, is to consider each day as a group of variables in the analysis of the tables (each table corresponds to one date) juxtaposed row-wise (the table analysed thus has <math>I</math> rows and <math>J</math>x<math>K</math> columns).
 
'' Conclusion about these examples'': These examples show that in practice, the variables are very often organized into groups very often.
 
'' Conclusion about these examples'': in practice, the variables are organized into groups very often.
== Graphics from MFA ==
 
Beyond the weighting of variables, the interest of thein MFA lies in a series of graphics and indicators valuable in the analysis of a table whose columns are organized into groups.
 
=== Graphics common to all the simple factorial analyses (PCA, MCA) ===
 
The core of the MFA is a weighted factorial analysis: MFA firstly provides the classical results of the factorial analyses.
===Graphics common to all the simple factorial analyses (PCA, MCA) ===
 
1. ''Representations of individuals'' in which two individuals are muchclose closerto thaneach other if they haveexhibit similar values for allmany variables in allthe different variable groups; in practice the user especiallyparticularly studies the first factorial plane.
The core of the MFA is a weighted factorial analysis: MFA firstly provides the classical results of the factorial analyses.
 
1. ''Representations of individuals'' in which two individuals are much closer than they have similar values for all variables in all groups; in practice the user especially studies the first factorial plane.
2.''Representations of quantitative variables'' as in PCA (correlation circle).
{| width=100% border="0"
Line 158 ⟶ 156:
In the example:
* The first axis mainly opposes individuals 1 and 5 (Figure 1).
* The four variables have a positive coordinate (Figure 2): the first axis is a size effect. Thus, the individual 1 has low values for all the variables and individual 5 has high values for all the variables.
3. '' Indicators aiding interpretation'': projected inertia, contributions and quality of representation. In the example, the contribution of individuals 1 and 5 to the inertia of the first axis is 45.7% + 31.5% = 77.2% which justifies the interpretation focussed on these two points.
 
4. ''Representations of categories'' of qualitative variables as in MCA (a category lies at the centroid of the individuals who possess it). No qualitative variables in the example.
 
=== Graphics specific to this kind of multiple table ===
 
5. ''Superimposed representations of individuals'' « seen » by each group. An individual considered from the point of view of a single group is called « ''partial individual »'' (in parallel, an individual considered from the point of view of all variables is said “mean''mean individual'' because it lies at the center of gravity of its partial points). Partial cloud <math>N_i^j</math> gathers the <math>I</math> individuals from the perspective of the single group <math>j</ math> (ie <math>{i^j, j = 1, J}</math>): that is the cloud analysed in the separate factorial analysis (PCA or MCA) of the group <math>j</math>. The superimposed representation of the <math> N_i^j</math> provided by the MFA is similar in its purpose to that provided by the [[Procrustes analysis]].
[[File:AFM fig3.jpg|center|thumb|Figure 3. MFA. Test data. Superimposed representation of mean and partial clouds.]]
 
In the example (figure 3), individual 1 is characterized by a small size (i.e. small values) both in terms of group 1 and group 2 (partial points of the individual 1 have a negative coordinate and are close one another). On the contrary, the individual 5 is more characterized by high values for the variables of group 2 than for the variables of group 1 (for the individual 5, group 2 partial point lies further from the origin than group 1 partial point). This reading of the graph can be checked directly in the data.
 
6. '' Representations of groups of variables '' as such. In these graphs, each group of variables is represented by a single point. Two groups of variables are close one another when they define the same structure on individuals. Extreme case: two groups of variables that define homothetic clouds of individuals <math>N_i^j</math> coincide. The coordinate of group <math>j</math> along the axis <math>s</math> is equal to the contribution of the group <math>j</math> to the inertia of MFA dimension of rank <math>s</math>. This contribution can be interpreted as an indicator of relationship (between the group <math>j</math> and the axis <math>s</math>, hence the name [[''relationship square]]'' given to this type of representation). This representation also exists in other factorial methods (MCA and FAMD in particular) in which case the groups of variable are each reduced to a single variable.
[[File:AFM fig4.jpg|center|thumb|Figure4. MFA. Test data. Representation of groups of variables.]]
 
Line 177 ⟶ 176:
 
7. ''Representations of factors of separate analyses'' of the different groups. These factors are represented as supplementary quantitative variables (correlation circle).
[[File:AFM fig5.jpg|center|thumb|Figure 5. MFA. Test data. Representation of the principal components of separate PCA of each group.]]
 
In the example (figure 5), the first axis of the MFA is relatively strongly correlated (r = .80) to the first component of the group 2. This group, consisting of two identical variables, possesses only one principal component (confounded with the variable). The group 1 consists of two orthogonal variables: any direction of the subspace generated by these two variables has the same inertia (equal to 1). So there is uncertainty in the choice of principal components and there is no reason to be interested in one of them in particular. However, the two components provided by the program are well represented: the plane of the MFA is close to the plane spanned by the two variables of group 1.
 
== Conclusion ==
 
The numerical example illustrates the output of the MFA. Besides balancing groups of variables and besides usual graphics of PCA (of MCA in the case of qualitative variables), the MFA provides results specific of the group structure of the set of variables, that is, in particular:
* A superimposed representation of partial individuals for a detailed analysis of the data;
Line 187:
* A representation of factors from separate analyses.
 
The small size and simplicity of the example allow tosimple easilyvalidation validateof the rules of interpretation. But the method will be more valuable when the data set is large and complex.
Other methods suitable for this type of data are available. [[Procrustes analysis]] is compared to the MFA in.<ref>Pagès {{HarvspJérôme |(2014). TextMultiple PagesFactor =Analysis 2013by |Example idUsing =R. Pagès2014Chapman |}}& Hall/CRC The R Series, London. 272p</ref>
 
== History ==
The MFA was developed by [[Brigitte Escofier-Cordier|Brigitte Escofier]] and Jérôme Pagès in the 1980s. It is at the heart of two books written by these authors:
{{Harvsp | text = Escofier & Pagès, 2008 | id = Escofier & Pagès}} and {{Harvsp | text = Pagès 2014 | id = Pagès2014}}. The MFA and its extensions (hierarchical MFA, MFA on contingency tables, etc.) are a research topic of applied mathematics laboratory Agrocampus ([http://math.agrocampus-ouest.fr LMA ²]).
==Software ==
MFA is available in two package R ([http://factominer.free.fr FactoMineR] and [ADE http://pbil.univ-lyon1.fr/ADE-4 4]) and in many software SPAD, Uniwin, XLSTAT, etc. There is also a function [http://www.ensai.fr/userfiles/AFMULT% 20and% 20PLOTAFM% 20aout% 202010.pdf SAS]. The graphs in this article come from the R package FactoMineR.
 
MFA was developed by Brigitte Escofier and Jérôme Pagès in the 1980s. It is at the heart of two books written by these authors:<ref>''Ibidem''</ref> and.<ref>Escofier Brigitte & Pagès Jérôme (2008). Analyses factorielles simples et multiples; objectifs, méthodes et interprétation. Dunod, Paris. 318 p. {{ISBN|978-2-10-051932-3}}</ref> The MFA and its extensions (hierarchical MFA, MFA on contingency tables, etc.) are a research topic of applied mathematics laboratory Agrocampus ([http://math.agrocampus-ouest.fr LMA ²]) which published a book presenting basic methods of exploratory multivariate analysis.<ref>Husson F., Lê S. & Pagès J. (2009). Exploratory Multivariate Analysis by Example Using R. Chapman & Hall/CRC The R Series, London. {{ISBN|978-2-7535-0938-2}}</ref>
 
== Software ==
 
MFA is available in two packageR Rpackages ([http://factominer.free.fr FactoMineR] and [ADE http://pbil.univ-lyon1.fr/ADE-4 4ADE4]) and in many software packages, including SPAD, Uniwin, [[XLSTAT]], etc. There is also a function [http://www.ensai.fr/userfiles/AFMULT% 20and% 20PLOTAFM% 20aout% 202010.pdf SAS]{{dead link|date=February 2018 |bot=InternetArchiveBot |fix-attempted=yes }} . The graphs in this article come from the R package FactoMineR.
 
== References ==
 
<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags, these references will then appear here automatically -->
{{Reflist}}
* {{cite book | author=Jérôme Pagès|title= Multiple Factor Analysis by Example Using R | publisher = Chapman & Hall/CRC The R Series |___location= London | year =2014|pages =272 | id=Pagès2014}}
== External links ==
* [http://www.example.com www.example.com]
 
== External links ==
<!--- Categories --->
* [http://factominer.free.fr/ FactoMineR] A R software devoted to exploratory data analysis.
 
[[Category:ArticlesFactor created via the Article Wizardanalysis]]