Title: | Multivariate Analysis of Mixed Data |
---|---|
Description: | Implements principal component analysis, orthogonal rotation and multiple factor analysis for a mixture of quantitative and qualitative variables. |
Authors: | Marie Chavent [aut, cre], Vanessa Kuentz [aut], Amaury Labenne [aut], Benoit Liquet [aut], Jerome Saracco [aut] |
Maintainer: | Marie Chavent <[email protected]> |
License: | GPL (>=2.0) |
Version: | 3.1 |
Built: | 2025-02-01 04:58:04 UTC |
Source: | https://github.com/chavent/pcamixdata |
The data used here refer to athletes' performance during two sporting events.
data(decathlon)
data(decathlon)
A data frame with 41 rows and 13 columns: the first ten columns corresponds to the performance of the athletes for the 10 events of the decathlon. The columns 11 and 12 correspond respectively to the rank and the points obtained. The last column is a categorical variable corresponding to the sporting event (2004 Olympic Game or 2004 Decastar)
The references below.
Departement of Applied Mathematics, Agrocampus Rennes.
Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.
Data refering to 27 breeds of dogs.
A data frame with 27 rows (the breeds of dogs) and 7 columns: their size, weight and speed with 3 categories (small, medium, large), their intelligence (low, medium, high), their affectivity and aggressiveness with 3 categories (low, high), their function (utility, compagny, hunting).
Originated by A. Brefort (1982) and cited in Saporta G. (2011).
8 characteristics for 18 popular flowers.
data(flower)
data(flower)
A data frame with 18 observations on 8 variables:
[ , "V1"] | factor | winters |
[ , "V2"] | factor | shadow |
[ , "V3"] | factor | tubers |
[ , "V4"] | factor | color |
[ , "V5"] | ordered | soil |
[ , "V6"] | ordered | preference |
[ , "V7"] | numeric | height |
[ , "V8"] | numeric | distance |
winters, is binary and indicates whether the plant may be left in the garden when it freezes.
shadow, is binary and shows whether the plant needs to stand in the shadow.
tubers, is asymmetric binary and distinguishes between plants with tubers and plants that grow in any other way.
color, is nominal and specifies the flower's color (1 = white, 2 = yellow, 3 = pink, 4 = red, 5 = blue).
soil, is ordinal and indicates whether the plant grows in dry (1), normal (2), or wet (3) soil.
preference, is ordinal and gives someone's preference ranking going from 1 to 18.
height, is interval scaled, the plant's height in centimeters.
distance, is interval scaled, the distance in centimeters that should be left between the plants.
The reference below.
Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in an Object-Oriented Environment. Journal of Statistical Software, 1. http://www.stat.ucla.edu/journals/jss/
A list of 4 datasets caracterizing conditions of life of 542 cities in Gironde. The four datasets correspond to four thematics relative to conditions of life. Each dataset contains a different number of variables (quantitative and/or qualitative). The first three datasets come from the 2009 population census realized in Gironde by INSEE (Institut National de la Statistique et des Etudes Economiques). The fourth come from an IGN (Institut National de l'Information Geographique et forestiere) database.
data(gironde)
data(gironde)
A list of 4 data frames.
gironde$employment |
This data frame contains the description of 542 cities by 9 quantitative variables. These variables are related to employment conditions like, for instance, the average income (income), the percentage of farmers (farmer). |
gironde$housing |
This data frame contains the description of 542 cities by 5 variables (2 qualitative variables and 3 quantitative variables). These variables are related to housing conditions like, for instance, the population density (density), the percentage of counsil housing within the cities (council). |
gironde$services |
This data frame contains the description of 542 cities by 9 qualitative variables. These variables are related to the number of services within the cities, like, for instance, the number of bakeries (baker) or the number of post office (postoffice). |
gironde$environment |
This data frame contains the description of 542 cities by 4 quantitative variables. These variables are related to the natural environment of the cities, like, for instance the percentage of agricultural land (agricul) or the percentage of buildings (building). |
www.INSEE.fr
www.ign.fr
http://siddt.grenoble.cemagref.fr/
Multivariate analysis of mixed data: The PCAmixdata R package, M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco, arXiv:1411.4911 [stat.CO]
Performs multiple factor analysis to analyze a set of individuals (observations) described by several groups of variables. Variables within a group can be a mixture of quantitative and qualitative variables.
MFAmix(data, groups, name.groups, ndim=5, rename.level=FALSE, graph = TRUE, axes = c(1, 2))
MFAmix(data, groups, name.groups, ndim=5, rename.level=FALSE, graph = TRUE, axes = c(1, 2))
data |
a data frame with |
groups |
a vector which gives the groups of the columns in |
name.groups |
a vector of size |
ndim |
number of dimensions kept in the results (by default 5). |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names for the levels. |
graph |
boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: plot of the individuals coordinates, plot of the squared loadings of variables, plot of the partial axes, plot of the correlation circle (if quantitative variables are available), plot of the levels component map (if qualitative variables are available). |
axes |
a length 2 vector specifying the axes to plot. |
Multiple Factor Analysis (MFA) developed by Escofier and Pages in 1983 is a method of factorial analysis to deal with multiple groups of variables collected on the same observations. The main idea of MFA is to normalize each group by dividing all the variables belonging to this group by the first eigenvalue coming from the Principal Component Analysis (PCA) of this group. Then, a usual PCA on all the weighted variables taken together is applied. Initially this method has been developed for groups only containing quantitative variables. Afterwards this method has been improved to deal simultaneously with groups of qualitative variables and groups of quantitative variables. The MFAmix
method allows to perform MFA method for groups containing a mixture of quantitative and qualitative variables
One of the outputs available in the MFAmix method are the squared loadings (sqload
). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlation between the variable and the principal components.
Some others outputs are specific to MFA:
Coordinates of groups are the sum of the absolute contributions of variables belonging to the groups,
Partial individuals coordinates are factor coordinates of individuals according to a specific group. The partial coordinates can be achieved by projecting the data set of each group onto the principal component space of MFAmix,
Partial axes of a group are correlation between each principal components of the separated analyses of the group and the principal components of MFAmix.
eig |
a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance. |
ind |
a list containing the results for the individuals (observations):
|
quanti |
a list containing the results for the quantitative variables:
|
levels |
a list containing the results for the levels of the qualitative variables:
|
quali |
a list containing the results for the qualitative variables:
|
sqload |
a matrix of dimension ( |
coef |
the coefficients of the linear combinations used to construct the principal components of MFAmix, and to predict coordinates (scores) of new observations in the function |
eig.separate |
a matrix containing the |
separate.analyses |
the results for the separated analyses of each group. |
groups |
a list containing the results for the groups:
|
partial.axes |
a matrix containing the coordinates of the partial axes. |
ind.partial |
a list of |
listvar.group |
list the variables in each group. It is usefull to check the adequacy between the vector |
global.pca |
an object of class |
Amaury Labenne [email protected], Marie Chavent, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
Escofier, B. and Pages, J. (1994). Multiple factor analysis (afmult package). Computational statistics & data analysis, 18(1):121-140.
Le, S., Josse, J., and Husson, F. (2008). Factominer: an r package for multivariate analysis. Journal of statistical software, 25(1):1-18.
print.MFAmix
, summary.MFAmix
, predict.MFAmix
, plot.MFAmix
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names <- c("employment","housing","services","environment") dat<-cbind(gironde$employment[1:20,],gironde$housing[1:20,], gironde$services[1:20,],gironde$environment[1:20,]) res<-MFAmix(data=dat,groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) summary(res)
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names <- c("employment","housing","services","environment") dat<-cbind(gironde$employment[1:20,],gironde$housing[1:20,], gironde$services[1:20,],gironde$environment[1:20,]) res<-MFAmix(data=dat,groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) summary(res)
Performs principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. PCAmix includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases.
PCAmix( X.quanti = NULL, X.quali = NULL, ndim = 5, rename.level = FALSE, weight.col.quanti = NULL, weight.col.quali = NULL, graph = TRUE )
PCAmix( X.quanti = NULL, X.quali = NULL, ndim = 5, rename.level = FALSE, weight.col.quanti = NULL, weight.col.quali = NULL, graph = TRUE )
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
ndim |
number of dimensions kept in the results (by default 5). |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels. |
weight.col.quanti |
vector of weights for the quantitative variables. |
weight.col.quali |
vector of the weights for the qualitative variables. |
graph |
boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: component map of the individuals, plot of the squared loadings of all the variables (quantitative and qualitative), plot of the correlation circle (if quantitative variables are available), component map of the levels (if qualitative variables are available). |
If X.quali is not specified (i.e. NULL), only quantitative variables are available and standard PCA is performed. If X.quanti is NULL, only qualitative variables are available and standard MCA is performed.
Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.
PCAmix performs squared loadings in (sqload
). Squared loadings
for a qualitative variable are correlation ratios between the variable
and the principal components. For a quantitative variable,
squared loadings are the squared correlations between the variable
and the principal components.
Note that when all the p variables are qualitative, the factor coordinates (scores) of the n observations are equal to the factor coordinates (scores) of standard MCA times square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p. When all the variables are quantitative, PCAmix gives exactly the same results as standard PCA.
eig |
a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance. |
ind |
a list containing the results for the individuals (observations):
|
quanti |
a list containing the results for the quantitative variables:
|
levels |
a list containing the results for the levels of the qualitative variables:
|
quali |
a list containing the results for the qualitative variables:
|
sqload |
a matrix of dimension ( |
coef |
the coefficients of the linear combinations used to
construct the principal components of PCAmix, and to predict coordinates (scores) of new observations in the function |
M |
the vector of the weights of the columns used in the Generalized Singular Value Decomposition. |
Marie Chavent [email protected], Amaury Labenne.
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
print.PCAmix
, summary.PCAmix
, predict.PCAmix
, plot.PCAmix
#PCAMIX: data(wine) str(wine) X.quanti <- splitmix(wine)$X.quanti X.quali <- splitmix(wine)$X.quali pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4) pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE) pca$eig pca$ind$coord #PCA: data(decathlon) quali<-decathlon[,13] pca<-PCAmix(decathlon[,1:10]) pca<-PCAmix(decathlon[,1:10], graph=FALSE) plot(pca,choice="ind",coloring.ind=quali,cex=0.8, posleg="topright",main="Scores") plot(pca, choice="sqload",main="Squared correlations") plot(pca, choice="cor",main="Correlation circle") pca$quanti$coord #MCA data(flower) mca <- PCAmix(X.quali=flower[,1:4], rename.level=TRUE, graph=FALSE) plot(mca,choice="ind", main="Scores") plot(mca,choice="sqload", main="Correlation ratios") plot(mca,choice="levels", main="Levels") mca$levels$coord #Missing values data(vnf) PCAmix(X.quali=vnf,rename.level=TRUE) vnf2<-na.omit(vnf) PCAmix(X.quali=vnf2,rename.level=TRUE)
#PCAMIX: data(wine) str(wine) X.quanti <- splitmix(wine)$X.quanti X.quali <- splitmix(wine)$X.quali pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4) pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE) pca$eig pca$ind$coord #PCA: data(decathlon) quali<-decathlon[,13] pca<-PCAmix(decathlon[,1:10]) pca<-PCAmix(decathlon[,1:10], graph=FALSE) plot(pca,choice="ind",coloring.ind=quali,cex=0.8, posleg="topright",main="Scores") plot(pca, choice="sqload",main="Squared correlations") plot(pca, choice="cor",main="Correlation circle") pca$quanti$coord #MCA data(flower) mca <- PCAmix(X.quali=flower[,1:4], rename.level=TRUE, graph=FALSE) plot(mca,choice="ind", main="Scores") plot(mca,choice="sqload", main="Correlation ratios") plot(mca,choice="levels", main="Levels") mca$levels$coord #Missing values data(vnf) PCAmix(X.quali=vnf,rename.level=TRUE) vnf2<-na.omit(vnf) PCAmix(X.quali=vnf2,rename.level=TRUE)
Orthogonal rotation in PCAmix by maximization of the varimax function expressed in terms of PCAmix squared loadings (correlation ratios for qualitative variables and squared correlations for quantitative variables). PCArot includes the ordinary varimax rotation in Principal Component Analysis (PCA) and a varimax-type rotation in Multiple Correspondence Analysis (MCA) as special cases.
PCArot(obj, dim, itermax = 100, graph = TRUE)
PCArot(obj, dim, itermax = 100, graph = TRUE)
obj |
an object of class PCAmix. |
dim |
number of rotated Principal Components. |
itermax |
maximum number of iterations in the Kaiser's practical optimization algorithm based on successive pairwise planar rotations. |
graph |
boolean, if TRUE the following graphs are displayed for the first two dimensions after rotation: plot of the individuals (factor coordinates), plot of the variables (squared loadings) plot of the correlation circle (if quantitative variables are available), plot of the levels component map (if qualitative variables are available). |
If X.quali is not specified (i.e. NULL) in the previous PCAmix step, only quantitative variables are available and standard varimax rotation in PCA is performed. If X.quanti is NULL, only qualitative variables are available and varimax-type rotation in MCA is performed. Note that p1 is the number of quantitative variables, p2 is the number of qualitative variables and m is the total number of levels of the p2 qualitative variables.
eig |
variances of the ndim dimensions after rotation. |
ind$coord |
a n by dim quantitative matrix which contains the coordinates (scores) of the n individuals on the dim rotated principal components. |
quanti$coord |
a p1 by dim quantitative matrix which contains the coordinates (loadings) of the p1 quantitative variables after rotation. The coordinates of the quantitative variables after rotation are correlations with the rotated principal components. |
levels$coord |
a m by dim quantitative matrix which contains the coordinates of the m levels on the dim rotated principal components. |
quali$coord |
a p2 by dim quantitative matrix which contains the coordinates of the p2 qualitative variables on the dim rotated principal components. Coordinates of the qualitative variables after rotation are correlation ratio with the rotated principal components. |
coef |
coefficients of the linear combinations used to construct the rotated principal components of PCAmix. |
theta |
angle of rotation if dim is equal to 2. |
T |
matrix of rotation. |
Marie Chavent [email protected], Vanessa Kuentz, Benoit Liquet, Jerome Saracco
Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX. Advances in Classification and Data Analysis, Vol. 6, pp. 131-146.
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of qualitative and quantitative variables, Psychometrika, 56, 197-212.
plot.PCAmix
, summary.PCAmix
, PCAmix
, predict.PCAmix
#PCAMIX: data(wine) pca<-PCAmix(X.quanti=wine[,c(3:29)],X.quali=wine[,1:2],ndim=4,graph=FALSE) pca rot<-PCArot(pca,3) rot rot$eig #percentages of variances after rotation plot(rot,choice="ind",coloring.ind=wine[,1], posleg="bottomleft", main="Rotated scores") plot(rot,choice="sqload",main="Squared loadings after rotation") plot(rot,choice="levels",main="Levels after rotation") plot(rot,choice="cor",main="Correlation circle after rotation") #PCA: data(decathlon) quali<-decathlon[,13] pca<-PCAmix(decathlon[,1:10], graph=FALSE) rot<-PCArot(pca,3) plot(rot,choice="ind",coloring.ind=quali,cex=0.8, posleg="topright",main="Scores after rotation") plot(rot, choice="sqload", main="Squared correlations after rotation") plot(rot, choice="cor", main="Correlation circle after rotation") #MCA data(flower) mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE,graph=FALSE) rot<-PCArot(mca,2) plot(rot,choice="ind",main="Scores after rotation") plot(rot, choice="sqload", main="Correlation ratios after rotation") plot(rot, choice="levels", main="Levels after rotation")
#PCAMIX: data(wine) pca<-PCAmix(X.quanti=wine[,c(3:29)],X.quali=wine[,1:2],ndim=4,graph=FALSE) pca rot<-PCArot(pca,3) rot rot$eig #percentages of variances after rotation plot(rot,choice="ind",coloring.ind=wine[,1], posleg="bottomleft", main="Rotated scores") plot(rot,choice="sqload",main="Squared loadings after rotation") plot(rot,choice="levels",main="Levels after rotation") plot(rot,choice="cor",main="Correlation circle after rotation") #PCA: data(decathlon) quali<-decathlon[,13] pca<-PCAmix(decathlon[,1:10], graph=FALSE) rot<-PCArot(pca,3) plot(rot,choice="ind",coloring.ind=quali,cex=0.8, posleg="topright",main="Scores after rotation") plot(rot, choice="sqload", main="Squared correlations after rotation") plot(rot, choice="cor", main="Correlation circle after rotation") #MCA data(flower) mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE,graph=FALSE) rot<-PCArot(mca,2) plot(rot,choice="ind",main="Scores after rotation") plot(rot, choice="sqload", main="Correlation ratios after rotation") plot(rot, choice="levels", main="Levels after rotation")
Displays the graphical outputs of MFAmix. Individuals (observations), quantitative variables and levels of the qualitative variables are plotted as points using their factor coordinates (scores) in MFAmix. All the variables (quantitative and qualitative) are plotted on the same graph as points using their squared loadings. The groups of variables are plotted using their contributions to the component coordinates. Partial axes and partial individuals of separated analyses can also be plotted.
## S3 method for class 'MFAmix' plot( x, axes = c(1, 2), choice = "ind", label = TRUE, coloring.var = "not", coloring.ind = NULL, nb.partial.axes = 3, col.ind = NULL, col.groups = NULL, partial = NULL, lim.cos2.plot = 0, lim.contrib.plot = 0, xlim = NULL, ylim = NULL, cex = 1, main = NULL, leg = TRUE, posleg = "topleft", cex.leg = 0.8, col.groups.sup = NULL, posleg.sup = "topright", nb.paxes.sup = 3, ... )
## S3 method for class 'MFAmix' plot( x, axes = c(1, 2), choice = "ind", label = TRUE, coloring.var = "not", coloring.ind = NULL, nb.partial.axes = 3, col.ind = NULL, col.groups = NULL, partial = NULL, lim.cos2.plot = 0, lim.contrib.plot = 0, xlim = NULL, ylim = NULL, cex = 1, main = NULL, leg = TRUE, posleg = "topleft", cex.leg = 0.8, col.groups.sup = NULL, posleg.sup = "topright", nb.paxes.sup = 3, ... )
x |
an object of class MFAmix obtained with the function |
axes |
a length 2 vector specifying the components to plot. |
choice |
the graph to plot:
|
label |
boolean, if FALSE the labels of the points are not plotted. |
coloring.var |
a value to choose among:
|
coloring.ind |
a qualitative variable such as a character vector or a factor of size n (the number of individuals). The individuals are colored according to the levels of this variable. If NULL, the individuals are not colored. |
nb.partial.axes |
f choice="axes", the maximum number of partial axes related to each group to plot on the correlation circle. By default equal to 3. |
col.ind |
a vector of colors, of size the number of levels of
|
col.groups |
a vector of colors, of size the number of groups. If NULL, colors are chosen automatically. |
partial |
a vector of class character with the row names of the individuals,
for which the partial individuals should be drawn.
By default partial = NULL and no partial points are drawn.
Partial points are colored according to |
lim.cos2.plot |
a value between 0 and 1. Points with squared cosinus below this value are not plotted. |
lim.contrib.plot |
a value between 0 and 100. Points with relative contributions (in percentage) below this value are not plotted. |
xlim |
a numeric vectors of length 2, giving the x coordinates range. If NULL (by default) the range is defined automatically (recommended). |
ylim |
a numeric vectors of length 2, giving the y coordinates range. If NULL (by default) the range is defined automatically (recommended). |
cex |
cf. function |
main |
a string corresponding to the title of the graph to draw. |
leg |
boolean, if TRUE, a legend is displayed.. |
posleg |
position of the legend. |
cex.leg |
a numerical value giving the amount by which the legend should be magnified. Default is 0.8. |
col.groups.sup |
a vector of colors, of size the number of supplementary groups. If NULL, colors are chosen automatically. |
posleg.sup |
position of the legend for the supplementary groups. |
nb.paxes.sup |
if choice="axes", the maximum number of partial axes of supplementary groups ploted on the correlation circle. By default equal to 3. |
... |
arguments to be passed to methods, such as graphical parameters. |
The observations can be colored according to the levels of a qualitative variable. The observations, the quantitative variables and the levels can be selected according to their squared cosine (lim.cos2.plot) or their relative contribution (lim.contrib.plot) to the component map. Only points with squared cosine or relative contribution greater than a given threshold are plotted. Note that the relative contribution of a point to the component map (a plan) is the sum of the absolute contributions to each dimension, divided by the sum of the corresponding eigenvalues.
, [email protected], Amaury Labenne
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names <- c("employment","housing","services","environment") dat <- cbind(gironde$employment[1:20,],gironde$housing[1:20,], gironde$services[1:20,],gironde$environment[1:20,]) res <- MFAmix(data=dat,groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) #---- quantitative variables plot(res,choice="cor",cex=0.6) plot(res,choice="cor",cex=0.6,coloring.var="groups") plot(res,choice="cor",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","brown"),leg=TRUE) #----partial axes plot(res,choice="axes",cex=0.6) plot(res,choice="axes",cex=0.6,coloring.var="groups") plot(res,choice="axes",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","brown"),leg=TRUE) #----groups plot(res,choice="groups",cex=0.6) #no colors for groups plot(res,choice="groups",cex=0.6,coloring.var="groups") plot(res,choice="groups",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","blue")) #----squared loadings plot(res,choice="sqload",cex=0.8) #no colors for groups plot(res,choice="sqload",cex=0.8,coloring.var="groups", posleg="topright") plot(res,choice="sqload",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","blue"),ylim=c(0,1)) plot(res,choice="sqload",cex=0.8,coloring.var="type", cex.leg=0.9,posleg="topright") #----individuals plot(res,choice="ind",cex=0.6) #----individuals with squared cosine greater than 0.5 plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.5) #----individuals colored with a qualitative variable nbchem <- gironde$services$chemist[1:20] plot(res,choice="ind",cex=0.6,coloring.ind=nbchem, posleg="topright") plot(res,choice="ind",coloring.ind=nbchem, col.ind=c("pink","brown","darkblue"),label=FALSE,posleg="topright") #----partial individuals colored by groups plot(res,choice="ind",partial=c("AUBIAC","ARCACHON"), cex=0.6,posleg="bottomright") #----levels of qualitative variables plot(res,choice="levels",cex=0.8) plot(res,choice="levels",cex=0.8,coloring.var="groups") #levels with squared cosine greater than 0.6 plot(res,choice="levels",cex=0.8, lim.cos2.plot=0.6) #supplementary groups data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <- splitmix(wine)$X.quanti[,28:29] X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE] data <- cbind(X.quanti,X.quali) data.sup <- cbind(X.quanti.sup,X.quali.sup) groups <-c(1,2,2,3,3,1) name.groups <- c("G1","G2","G3") groups.sup <- c(1,1,2) name.groups.sup <- c("Gsup1","Gsup2") mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE) mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE) plot(mfa.sup,choice="sqload",coloring.var="groups") plot(mfa.sup,choice="axes",coloring.var="groups") plot(mfa.sup,choice="groups",coloring.var="groups") plot(mfa.sup,choice="levels",coloring.var="groups") plot(mfa.sup,choice="levels") plot(mfa.sup,choice="cor",coloring.var = "groups")
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names <- c("employment","housing","services","environment") dat <- cbind(gironde$employment[1:20,],gironde$housing[1:20,], gironde$services[1:20,],gironde$environment[1:20,]) res <- MFAmix(data=dat,groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) #---- quantitative variables plot(res,choice="cor",cex=0.6) plot(res,choice="cor",cex=0.6,coloring.var="groups") plot(res,choice="cor",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","brown"),leg=TRUE) #----partial axes plot(res,choice="axes",cex=0.6) plot(res,choice="axes",cex=0.6,coloring.var="groups") plot(res,choice="axes",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","brown"),leg=TRUE) #----groups plot(res,choice="groups",cex=0.6) #no colors for groups plot(res,choice="groups",cex=0.6,coloring.var="groups") plot(res,choice="groups",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","blue")) #----squared loadings plot(res,choice="sqload",cex=0.8) #no colors for groups plot(res,choice="sqload",cex=0.8,coloring.var="groups", posleg="topright") plot(res,choice="sqload",cex=0.6,coloring.var="groups", col.groups=c("red","yellow","pink","blue"),ylim=c(0,1)) plot(res,choice="sqload",cex=0.8,coloring.var="type", cex.leg=0.9,posleg="topright") #----individuals plot(res,choice="ind",cex=0.6) #----individuals with squared cosine greater than 0.5 plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.5) #----individuals colored with a qualitative variable nbchem <- gironde$services$chemist[1:20] plot(res,choice="ind",cex=0.6,coloring.ind=nbchem, posleg="topright") plot(res,choice="ind",coloring.ind=nbchem, col.ind=c("pink","brown","darkblue"),label=FALSE,posleg="topright") #----partial individuals colored by groups plot(res,choice="ind",partial=c("AUBIAC","ARCACHON"), cex=0.6,posleg="bottomright") #----levels of qualitative variables plot(res,choice="levels",cex=0.8) plot(res,choice="levels",cex=0.8,coloring.var="groups") #levels with squared cosine greater than 0.6 plot(res,choice="levels",cex=0.8, lim.cos2.plot=0.6) #supplementary groups data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <- splitmix(wine)$X.quanti[,28:29] X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE] data <- cbind(X.quanti,X.quali) data.sup <- cbind(X.quanti.sup,X.quali.sup) groups <-c(1,2,2,3,3,1) name.groups <- c("G1","G2","G3") groups.sup <- c(1,1,2) name.groups.sup <- c("Gsup1","Gsup2") mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE) mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE) plot(mfa.sup,choice="sqload",coloring.var="groups") plot(mfa.sup,choice="axes",coloring.var="groups") plot(mfa.sup,choice="groups",coloring.var="groups") plot(mfa.sup,choice="levels",coloring.var="groups") plot(mfa.sup,choice="levels") plot(mfa.sup,choice="cor",coloring.var = "groups")
Displays the graphical outputs of PCAmix and PCArot. The individuals (observations), the quantitative variables and the levels of the qualitative variables are plotted as points using their factor coordinates (scores). All the variables (quantitative and qualitative) are plotted as points on the same graph using their squared loadings.
## S3 method for class 'PCAmix' plot( x, axes = c(1, 2), choice = "ind", label = TRUE, coloring.ind = NULL, col.ind = NULL, coloring.var = FALSE, lim.cos2.plot = 0, lim.contrib.plot = 0, posleg = "topleft", xlim = NULL, ylim = NULL, cex = 1, leg = TRUE, main = NULL, cex.leg = 1, ... )
## S3 method for class 'PCAmix' plot( x, axes = c(1, 2), choice = "ind", label = TRUE, coloring.ind = NULL, col.ind = NULL, coloring.var = FALSE, lim.cos2.plot = 0, lim.contrib.plot = 0, posleg = "topleft", xlim = NULL, ylim = NULL, cex = 1, leg = TRUE, main = NULL, cex.leg = 1, ... )
x |
an object of class PCAmix obtained with the function |
axes |
a length 2 vector specifying the components to plot. |
choice |
the graph to plot:
|
label |
boolean, if FALSE the labels of the points are not plotted. |
coloring.ind |
a qualitative variable such as a character vector or a factor of size n (the number of individuals). The individuals are colored according to the levels of this variable. If NULL, the individuals are not colored. |
col.ind |
a vector of colors, of size the number of levels of
|
coloring.var |
boolean, if TRUE, the variables in the plot of the squared loadings are colored according to their type (quantitative or qualitative). |
lim.cos2.plot |
a value between 0 and 1. Points with squared cosinus below this value are not plotted. |
lim.contrib.plot |
a value between 0 and 100. Points with relative contributions (in percentage) below this value are not plotted. |
posleg |
position of the legend. |
xlim |
a numeric vectors of length 2, giving the x coordinates range. If NULL (by default) the range is defined automatically (recommended). |
ylim |
a numeric vectors of length 2, giving the y coordinates range. If NULL (by default) the range is defined automatically (recommended). |
cex |
cf. function |
leg |
boolean, if TRUE, a legend is displayed. |
main |
a string corresponding to the title of the graph to draw. |
cex.leg |
a numerical value giving the amount by which the legend should be magnified. Default is 0.8. |
... |
arguments to be passed to methods, such as graphical parameters. |
The observations can be colored according to the levels of a qualitative variable. The observations, the quantitative variables and the levels can be selected according to their squared cosine (lim.cos2.plot) or their relative contribution (lim.contrib.plot) to the component map. Only points with squared cosine or relative contribution greater than a given threshold are plotted. Note that the relative contribution of a point to the component map (a plan) is the sum of the absolute contributions to each dimension, divided by the sum of the corresponding eigenvalues.
Marie Chavent [email protected], Amaury Labenne
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
data(gironde) base <- gironde$housing[1:20,] X.quanti <-splitmix(base)$X.quanti X.quali <- splitmix(base)$X.quali res<-PCAmix(X.quanti, X.quali, rename.level=TRUE, ndim=3,graph=FALSE) #----quantitative variables on the correlation circle plot(res,choice="cor",cex=0.8) #----individuals component map plot(res,choice="ind",cex=0.8) #----individuals colored with the qualitative variable "houses" houses <- X.quali$houses plot(res,choice="ind",cex=0.6,coloring.ind=houses) #----individuals selected according to their cos2 plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.8) #----all the variables plotted with the squared loadings plot(res,choice="sqload",cex=0.8) #----variables colored according to their type (quanti or quali) plot(res,choice="sqload",cex=0.8,coloring.var=TRUE) #----levels component map plot(res,choice="levels",cex=0.8) #----example with supplementary variables data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <-splitmix(wine)$X.quanti[,28:29] X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE] pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE) pca2 <- supvar(pca,X.quanti.sup,X.quali.sup) plot(pca2,choice="levels") plot(pca2,choice="cor") plot(pca2,choice="sqload")
data(gironde) base <- gironde$housing[1:20,] X.quanti <-splitmix(base)$X.quanti X.quali <- splitmix(base)$X.quali res<-PCAmix(X.quanti, X.quali, rename.level=TRUE, ndim=3,graph=FALSE) #----quantitative variables on the correlation circle plot(res,choice="cor",cex=0.8) #----individuals component map plot(res,choice="ind",cex=0.8) #----individuals colored with the qualitative variable "houses" houses <- X.quali$houses plot(res,choice="ind",cex=0.6,coloring.ind=houses) #----individuals selected according to their cos2 plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.8) #----all the variables plotted with the squared loadings plot(res,choice="sqload",cex=0.8) #----variables colored according to their type (quanti or quali) plot(res,choice="sqload",cex=0.8,coloring.var=TRUE) #----levels component map plot(res,choice="levels",cex=0.8) #----example with supplementary variables data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <-splitmix(wine)$X.quanti[,28:29] X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE] pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE) pca2 <- supvar(pca,X.quanti.sup,X.quali.sup) plot(pca2,choice="levels") plot(pca2,choice="cor") plot(pca2,choice="sqload")
This function performs the scores of new observations on the principal components of MFAmix. In other words, this function is projecting the new observations onto the principal components of MFAmix obtained previoulsy on a separated dataset. Note that the new observations must be described with the same variables than those used in MFAmix. The groups of variables must also be identical.
## S3 method for class 'MFAmix' predict(object, data, ...)
## S3 method for class 'MFAmix' predict(object, data, ...)
object |
an object of class MFAmix obtained with the function
|
data |
a data frame containing the description of the new observations
on all the variables. This data frame will be split into |
... |
urther arguments passed to or from other methods. They are ignored in this function. |
Returns the matrix of the scores of the new observations on the principal components or on the rotated principal components of MFAmix.
Marie Chavent [email protected], Amaury Labenne.
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names<-c("employment","housing","services","environment") dat<-cbind(gironde$employment,gironde$housing, gironde$services,gironde$environment) n <- nrow(dat) set.seed(10) sub <- sample(1:n,520) res<-MFAmix(data=dat[sub,],groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) #Predict scores of new data pred<-predict(res,data=dat[-sub,]) plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.7) points(pred[1:5,c(1,2)],col=2,pch=16,cex=0.6) text(pred[1:5,c(1,2)], labels = rownames(dat[-sub,])[1:5], col=2,pos=3,cex=0.6)
data(gironde) class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4)) names<-c("employment","housing","services","environment") dat<-cbind(gironde$employment,gironde$housing, gironde$services,gironde$environment) n <- nrow(dat) set.seed(10) sub <- sample(1:n,520) res<-MFAmix(data=dat[sub,],groups=class.var, name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE) #Predict scores of new data pred<-predict(res,data=dat[-sub,]) plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.7) points(pred[1:5,c(1,2)],col=2,pch=16,cex=0.6) text(pred[1:5,c(1,2)], labels = rownames(dat[-sub,])[1:5], col=2,pos=3,cex=0.6)
This function performs the scores of new observations on the principal components of PCAmix. If the components have been rotated, this function performs the scores of the new observations on the rotated principal components. In other words, this function is projecting the new observations onto the principal components of PCAmix (or PCArot) obtained previoulsy on a separated dataset. Note that the new observations must be described with the same variables than those used in PCAmix (or PCArot).
## S3 method for class 'PCAmix' predict(object, X.quanti = NULL, X.quali = NULL, ...)
## S3 method for class 'PCAmix' predict(object, X.quanti = NULL, X.quali = NULL, ...)
object |
an object of class PCAmix obtained with the function
|
X.quanti |
a numeric data matrix or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
... |
urther arguments passed to or from other methods. They are ignored in this function. |
Returns the matrix of the scores of the new observations on the principal components or on the rotated principal components of PCAmix.
Marie Chavent [email protected], Amaury Labenne.
Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].
# quantitative data data(decathlon) n <- nrow(decathlon) train <- sample(1:n,20) pca <- PCAmix(decathlon[train,1:10], graph=FALSE) predict(pca, decathlon[-train,1:10]) rot <- PCArot(pca,dim=4) predict(rot,decathlon[-train,1:10]) # qualitative data data(flower) n <- nrow(flower) train <- sample(1:n,10) mca <- PCAmix(X.quali=flower[train,1:3], rename.level=TRUE, graph=FALSE) predict(mca, X.quali=flower[-train,1:3]) # quantitative and qualitative data data(wine) X.quanti <- splitmix(wine)$X.quanti X.quali <- splitmix(wine)$X.quali n <- nrow(wine) train <- sample(1:n, 10) pca <-PCAmix(X.quanti[train,1:10], X.quali[train,], ndim=4) pred <- predict(pca, X.quanti[-train,1:10], X.quali[-train,]) plot(pca,axes=c(1,2)) points(pred[,c(1,2)],col=2,pch=16) text(pred[,c(1,2)], labels = rownames(X.quanti[-train,1:27]), col=2,pos=3)
# quantitative data data(decathlon) n <- nrow(decathlon) train <- sample(1:n,20) pca <- PCAmix(decathlon[train,1:10], graph=FALSE) predict(pca, decathlon[-train,1:10]) rot <- PCArot(pca,dim=4) predict(rot,decathlon[-train,1:10]) # qualitative data data(flower) n <- nrow(flower) train <- sample(1:n,10) mca <- PCAmix(X.quali=flower[train,1:3], rename.level=TRUE, graph=FALSE) predict(mca, X.quali=flower[-train,1:3]) # quantitative and qualitative data data(wine) X.quanti <- splitmix(wine)$X.quanti X.quali <- splitmix(wine)$X.quali n <- nrow(wine) train <- sample(1:n, 10) pca <-PCAmix(X.quanti[train,1:10], X.quali[train,], ndim=4) pred <- predict(pca, X.quanti[-train,1:10], X.quali[-train,]) plot(pca,axes=c(1,2)) points(pred[,c(1,2)],col=2,pch=16) text(pred[,c(1,2)], labels = rownames(X.quanti[-train,1:27]), col=2,pos=3)
This is a method for the function print for objects of the class MFAmix
.
## S3 method for class 'MFAmix' print(x, ...)
## S3 method for class 'MFAmix' print(x, ...)
x |
an object of class |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
This is a method for the function print for objects of the class PCAmix
.
## S3 method for class 'PCAmix' print(x, ...)
## S3 method for class 'PCAmix' print(x, ...)
x |
an object of class |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
The data measure the amount of protein consumed for nine food groups in 25 European countries. The nine food groups are red meat (RedMeat), white meat (WhiteMeat), eggs (Eggs), milk (Milk), fish (Fish), cereal (Cereal), starch (Starch), nuts (Nuts), and fruits and vegetables (FruitVeg).
A data frame with 25 rows (the European countries) and 9 columns (the food groups)
Originated by A. Weber and cited in Hand et al., A Handbook of Small Data Sets, (1994, p. 297).
Recoding of the quantitative and of the qualitative data matrix.
recod(X.quanti, X.quali,rename.level=FALSE)
recod(X.quanti, X.quali,rename.level=FALSE)
X.quanti |
a numerical data matrix. |
X.quali |
a categorical data matrix. |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". |
X |
X.quanti and X.quali concatenated in a single matrix. |
Y |
X.quanti with missing values replaced with mean values concatenated with the indicator matrix of X.quali with missing values replaced by zeros. |
Z |
X.quanti standardized (centered and reduced by standard deviations) concatenated with the indicator matrix of X.quali centered and reduced with the square roots of the relative frequencies of the categories. |
W |
X.quanti standardized (centered and reduced by standard deviations) concatenated with the indicator matrix of X.quali centered. |
n |
the number of observations. |
p |
the total number of variables |
p1 |
the number of quantitative variables |
p2 |
the number of qualitative variables |
g |
the means of the columns of Y |
s |
the standard deviations of the columns of Y |
G |
The indicator matix of X.quali with missing values replaced by 0 |
Gcod |
The indicator matix G reduced with the square roots of the relative frequencies of the categories |
Recoding of the qualitative data matrix.
recodqual(X,rename.level=FALSE)
recodqual(X,rename.level=FALSE)
X |
the qualitative data matrix. |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". |
G |
The indicator matix of X with missing values replaced by 0. |
data(vnf) X <- vnf[1:10,9:12] tab.disjonctif.NA(X) recodqual(X)
data(vnf) X <- vnf[1:10,9:12] tab.disjonctif.NA(X) recodqual(X)
Recoding of the quantitative data matrix.
recodquant(X)
recodquant(X)
X |
the quantitative data matrix. |
Z |
the standardized quantitative data matrix (centered and reduced with the standard deviations.) |
g |
the means of the columns of X |
s |
the standard deviations of the columns of X (population version with 1/n) |
Xcod |
The quantitative matrix X with missing values replaced with the column mean values. |
data(decathlon) X <- decathlon[1:5,1:5] X[1,2] <- NA X[2,3] <-NA rec <- recodquant(X)
data(decathlon) X <- decathlon[1:5,1:5] X[1,2] <- NA X[2,3] <-NA rec <- recodquant(X)
If the p variables of a data matrix of dimension (n,p) are separated into G groups, this functions splits this data matrix into G
datasets according the groups membership.
splitgroups(data, groups, name.groups)
splitgroups(data, groups, name.groups)
data |
the a data matrix into |
groups |
a vector of size |
name.groups |
a vector of size |
data.groups |
a list of G data matrix: one matrix for each group. |
listvar.groups |
The list of the variables in each group. |
data(decathlon) split <- splitgroups(decathlon,groups=c(rep(1,10),2,2,3), name.groups=c("Epreuve","Classement","Competition")) split$data.groups$Epreuve
data(decathlon) split <- splitgroups(decathlon,groups=c(rep(1,10),2,2,3), name.groups=c("Epreuve","Classement","Competition")) split$data.groups$Epreuve
Splits a mixed data matrix in two data sets: one with the quantitative variables and one with the qualitative variables. Here, the columns of class "integer are considered quantitative. If you want this column to be considered as qualitative, it must be of class character of factor.
splitmix(data)
splitmix(data)
data |
a data matrix or a data.frame with a mixture of quantitative and qualitative variables. |
X.quanti |
a data matrix containing only the quantitative variables. |
X.quali |
A data.frame containing only the qualitative variables. |
data(decathlon) data.split <- splitmix(decathlon) data.split$X.quanti data.split$X.quali
data(decathlon) data.split <- splitmix(decathlon) data.split$X.quanti data.split$X.quali
This is a method for the function summary for objects of the class MFAmix
.
## S3 method for class 'MFAmix' summary(object, ...)
## S3 method for class 'MFAmix' summary(object, ...)
object |
an object of class MFAmix obtained with the function |
... |
further arguments passed to or from other methods. |
Returns the total number of observations, the number of quantitative variables, the number of qualitative variables with the total number of levels. And all those values are also given by groups.
This is a method for the function summary for objects of the class PCAmix
.
## S3 method for class 'PCAmix' summary(object, ...)
## S3 method for class 'PCAmix' summary(object, ...)
object |
an object of class PCAmix obtained with the function |
... |
further arguments passed to or from other methods. |
Returns the matrix of squared loadings. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the scores or with the rotated (standardized) scores.
data(wine) X.quanti <- wine[,c(3:29)] X.quali <- wine[,c(1,2)] pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE) summary(pca) rot<-PCArot(pca,3,graph=FALSE) summary(rot)
data(wine) X.quanti <- wine[,c(3:29)] X.quali <- wine[,c(1,2)] pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE) summary(pca) rot<-PCArot(pca,3,graph=FALSE) summary(rot)
supvar
is a generic function for adding supplementary variables
in PCAmix
or MFAmix
. The function invokes invokes two methods which depend on
the class of the first argument.
supvar(obj, ...)
supvar(obj, ...)
obj |
an object of class |
... |
further arguments passed to or from other methods. |
This generic function has two methods supvar.PCAmix
and
supvar.MFAmix
Performs the coordinates of supplementary variables and groups on the component of an object of class MFAmix
.
## S3 method for class 'MFAmix' supvar(obj, data.sup, groups.sup, name.groups.sup, rename.level = FALSE, ...)
## S3 method for class 'MFAmix' supvar(obj, data.sup, groups.sup, name.groups.sup, rename.level = FALSE, ...)
obj |
an object of class |
data.sup |
a numeric matrix of data. |
groups.sup |
a vector which gives the groups of the columns in |
name.groups.sup |
a vector which gives the names of the supplementary groups. |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels. |
... |
further arguments passed to or from other methods. |
data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <- splitmix(wine)$X.quanti[,28:29] X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE] data <- cbind(X.quanti,X.quali) data.sup <- cbind(X.quanti.sup,X.quali.sup) groups <-c(1,2,2,3,3,1) name.groups <- c("G1","G2","G3") groups.sup <- c(1,1,2) name.groups.sup <- c("Gsup1","Gsup2") mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE) mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE)
data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <- splitmix(wine)$X.quanti[,28:29] X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE] data <- cbind(X.quanti,X.quali) data.sup <- cbind(X.quanti.sup,X.quali.sup) groups <-c(1,2,2,3,3,1) name.groups <- c("G1","G2","G3") groups.sup <- c(1,1,2) name.groups.sup <- c("Gsup1","Gsup2") mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE) mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE)
Performs the coordinates of supplementary variables on the
component of an object of class PCAmix
.
## S3 method for class 'PCAmix' supvar(obj, X.quanti.sup = NULL, X.quali.sup = NULL, rename.level = FALSE, ...)
## S3 method for class 'PCAmix' supvar(obj, X.quanti.sup = NULL, X.quali.sup = NULL, rename.level = FALSE, ...)
obj |
an object of class |
X.quanti.sup |
a numeric matrix of data. |
X.quali.sup |
a categorical matrix of data. |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels. |
... |
further arguments passed to or from other methods. |
data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <-splitmix(wine)$X.quanti[,28:29] X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE] pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE) pcasup <- supvar(pca,X.quanti.sup,X.quali.sup)
data(wine) X.quanti <- splitmix(wine)$X.quanti[,1:5] X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE] X.quanti.sup <-splitmix(wine)$X.quanti[,28:29] X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE] pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE) pcasup <- supvar(pca,X.quanti.sup,X.quali.sup)
This function built the indicator matrix of a qualitative data matrix. Missing observations are indicated as NAs.
tab.disjonctif.NA(tab, rename.level = FALSE)
tab.disjonctif.NA(tab, rename.level = FALSE)
tab |
a categorical data matrix.. |
rename.level |
boolean, if TRUE all the levels of the qualitative variables are renamed as follows: variable_name=level_name. |
This function uses the code of the function tab.disjonctif implemented in the package FactoMineR but is different. Here, a NA value appears when a category has not been observed in a row. In the function tab.disjonctif of the package FactoMineR, a new column is created in that case.
Returns the indicator matrix with NA for missing observations.
data(vnf) X <- vnf[1:10,9:12] tab.disjonctif.NA(X)
data(vnf) X <- vnf[1:10,9:12] tab.disjonctif.NA(X)
A user satisfaction survey of pleasure craft operators on the “Canal des Deux Mers”, located in South of France, was carried out by the public corporation “Voies Navigables de France” (VNF) responsible for managing and developing the largest network of navigable waterways in Europe
data(vnf)
data(vnf)
A data frame with 1232 observations and 14 qualitative variables.
Josse, J., Chavent, M., Liquet, B. and Husson, F. (2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification, Vol. 29, pp. 91-116.
The data used here refer to 21 wines of Val de Loire.
data(wine)
data(wine)
A data frame with 21 rows (the number of wines) and 31 columns: the first column corresponds to the label of origin, the second column corresponds to the soil, and the others correspond to sensory descriptors.
Centre de recherche INRA d'Angers
Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.