Package 'PCAmixdata'

Title: Multivariate Analysis of Mixed Data
Description: Implements principal component analysis, orthogonal rotation and multiple factor analysis for a mixture of quantitative and qualitative variables.
Authors: Marie Chavent [aut, cre], Vanessa Kuentz [aut], Amaury Labenne [aut], Benoit Liquet [aut], Jerome Saracco [aut]
Maintainer: Marie Chavent <[email protected]>
License: GPL (>=2.0)
Version: 3.1
Built: 2025-02-01 04:58:04 UTC
Source: https://github.com/chavent/pcamixdata

Help Index


Performance in decathlon (data)

Description

The data used here refer to athletes' performance during two sporting events.

Usage

data(decathlon)

Format

A data frame with 41 rows and 13 columns: the first ten columns corresponds to the performance of the athletes for the 10 events of the decathlon. The columns 11 and 12 correspond respectively to the rank and the points obtained. The last column is a categorical variable corresponding to the sporting event (2004 Olympic Game or 2004 Decastar)

Source

The references below.

References

Departement of Applied Mathematics, Agrocampus Rennes.

Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.


Breeds of Dogs data

Description

Data refering to 27 breeds of dogs.

Format

A data frame with 27 rows (the breeds of dogs) and 7 columns: their size, weight and speed with 3 categories (small, medium, large), their intelligence (low, medium, high), their affectivity and aggressiveness with 3 categories (low, high), their function (utility, compagny, hunting).

Source

Originated by A. Brefort (1982) and cited in Saporta G. (2011).


Flower Characteristics

Description

8 characteristics for 18 popular flowers.

Usage

data(flower)

Format

A data frame with 18 observations on 8 variables:

[ , "V1"] factor winters
[ , "V2"] factor shadow
[ , "V3"] factor tubers
[ , "V4"] factor color
[ , "V5"] ordered soil
[ , "V6"] ordered preference
[ , "V7"] numeric height
[ , "V8"] numeric distance
V1

winters, is binary and indicates whether the plant may be left in the garden when it freezes.

V2

shadow, is binary and shows whether the plant needs to stand in the shadow.

V3

tubers, is asymmetric binary and distinguishes between plants with tubers and plants that grow in any other way.

V4

color, is nominal and specifies the flower's color (1 = white, 2 = yellow, 3 = pink, 4 = red, 5 = blue).

V5

soil, is ordinal and indicates whether the plant grows in dry (1), normal (2), or wet (3) soil.

V6

preference, is ordinal and gives someone's preference ranking going from 1 to 18.

V7

height, is interval scaled, the plant's height in centimeters.

V8

distance, is interval scaled, the distance in centimeters that should be left between the plants.

Source

The reference below.

References

Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in an Object-Oriented Environment. Journal of Statistical Software, 1. http://www.stat.ucla.edu/journals/jss/


gironde

Description

A list of 4 datasets caracterizing conditions of life of 542 cities in Gironde. The four datasets correspond to four thematics relative to conditions of life. Each dataset contains a different number of variables (quantitative and/or qualitative). The first three datasets come from the 2009 population census realized in Gironde by INSEE (Institut National de la Statistique et des Etudes Economiques). The fourth come from an IGN (Institut National de l'Information Geographique et forestiere) database.

Usage

data(gironde)

Format

A list of 4 data frames.

Value

gironde$employment

This data frame contains the description of 542 cities by 9 quantitative variables. These variables are related to employment conditions like, for instance, the average income (income), the percentage of farmers (farmer).

gironde$housing

This data frame contains the description of 542 cities by 5 variables (2 qualitative variables and 3 quantitative variables). These variables are related to housing conditions like, for instance, the population density (density), the percentage of counsil housing within the cities (council).

gironde$services

This data frame contains the description of 542 cities by 9 qualitative variables. These variables are related to the number of services within the cities, like, for instance, the number of bakeries (baker) or the number of post office (postoffice).

gironde$environment

This data frame contains the description of 542 cities by 4 quantitative variables. These variables are related to the natural environment of the cities, like, for instance the percentage of agricultural land (agricul) or the percentage of buildings (building).

Source

www.INSEE.fr

www.ign.fr

http://siddt.grenoble.cemagref.fr/

Multivariate analysis of mixed data: The PCAmixdata R package, M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco, arXiv:1411.4911 [stat.CO]


Multiple factor analysis of mixed data

Description

Performs multiple factor analysis to analyze a set of individuals (observations) described by several groups of variables. Variables within a group can be a mixture of quantitative and qualitative variables.

Usage

MFAmix(data, groups, name.groups, ndim=5, rename.level=FALSE, graph = TRUE,
    axes = c(1, 2))

Arguments

data

a data frame with n rows and p columns containing all the variables. This data frame will be split into G groups according to the vector groups.

groups

a vector which gives the groups of the columns in data.

name.groups

a vector of size G which gives the names of the groups.

ndim

number of dimensions kept in the results (by default 5).

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names for the levels.

graph

boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: plot of the individuals coordinates, plot of the squared loadings of variables, plot of the partial axes, plot of the correlation circle (if quantitative variables are available), plot of the levels component map (if qualitative variables are available).

axes

a length 2 vector specifying the axes to plot.

Details

Multiple Factor Analysis (MFA) developed by Escofier and Pages in 1983 is a method of factorial analysis to deal with multiple groups of variables collected on the same observations. The main idea of MFA is to normalize each group by dividing all the variables belonging to this group by the first eigenvalue coming from the Principal Component Analysis (PCA) of this group. Then, a usual PCA on all the weighted variables taken together is applied. Initially this method has been developed for groups only containing quantitative variables. Afterwards this method has been improved to deal simultaneously with groups of qualitative variables and groups of quantitative variables. The MFAmix method allows to perform MFA method for groups containing a mixture of quantitative and qualitative variables

One of the outputs available in the MFAmix method are the squared loadings (sqload). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlation between the variable and the principal components.

Some others outputs are specific to MFA:

  • Coordinates of groups are the sum of the absolute contributions of variables belonging to the groups,

  • Partial individuals coordinates are factor coordinates of individuals according to a specific group. The partial coordinates can be achieved by projecting the data set of each group onto the principal component space of MFAmix,

  • Partial axes of a group are correlation between each principal components of the separated analyses of the group and the principal components of MFAmix.

Value

eig

a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance.

ind

a list containing the results for the individuals (observations):

  • $coord: factor coordinates (scores) of the individuals,

  • $contrib: absolute contributions of the individuals,

  • $contrib.pct: relative contributions of the individuals,

  • $cos2: squared cosinus of the individuals.

quanti

a list containing the results for the quantitative variables:

  • $coord: factor coordinates (scores) of the quantitative variables,

  • $contrib: absolute contributions of the quantitative variables,

  • $contrib.pct: relative contributions of the quantitative variables (in percentage),

  • $cos2: squared cosinus of the quantitative variables.

levels

a list containing the results for the levels of the qualitative variables:

  • $coord: factor coordinates (scores) of the levels,

  • $contrib: absolute contributions of the levels,

  • $contrib.pct: relative contributions of the levels (in percentage),

  • $cos2: squared cosinus of the levels.

quali

a list containing the results for the qualitative variables:

  • $contrib: absolute contributions of the qualitative variables (sum of absolute contributions of the levels of the qualitative variable),

  • $contrib.pct: relative contributions (in percentage) of the qualitative variables (sum of relative contributions of the levels of the qualitative variable).

sqload

a matrix of dimension (p, ndim) containing the squared loadings of the quantitative and qualitative variables.

coef

the coefficients of the linear combinations used to construct the principal components of MFAmix, and to predict coordinates (scores) of new observations in the function predict.MFAmix.

eig.separate

a matrix containing the ndim first eigenvalues of the separated analyses of each group.

separate.analyses

the results for the separated analyses of each group.

groups

a list containing the results for the groups:

  • $Lg: Lg coefficients between groups,

  • $RV: RV coefficients between groups,

  • $contrib: contributions of the groups (sum of variable contributions belonging to the group)

  • $contrib.pct: relative contributions of the groups times 100,

partial.axes

a matrix containing the coordinates of the partial axes.

ind.partial

a list of G matrices containing the coordinates of the partial individuals.

listvar.group

list the variables in each group. It is usefull to check the adequacy between the vector groups and the vector name.groups.

global.pca

an object of class PCAmix containing the results of MFAmix considered as a unique PCAmix.

Author(s)

Amaury Labenne [email protected], Marie Chavent, Vanessa Kuentz, Benoit Liquet, Jerome Saracco

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

Escofier, B. and Pages, J. (1994). Multiple factor analysis (afmult package). Computational statistics & data analysis, 18(1):121-140.

Le, S., Josse, J., and Husson, F. (2008). Factominer: an r package for multivariate analysis. Journal of statistical software, 25(1):1-18.

See Also

print.MFAmix, summary.MFAmix, predict.MFAmix, plot.MFAmix

Examples

data(gironde)

class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4))
names <- c("employment","housing","services","environment")

dat<-cbind(gironde$employment[1:20,],gironde$housing[1:20,],
      gironde$services[1:20,],gironde$environment[1:20,])
      
res<-MFAmix(data=dat,groups=class.var,
      name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE)
      
summary(res)

Principal component analysis of mixed data

Description

Performs principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. PCAmix includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases.

Usage

PCAmix(
  X.quanti = NULL,
  X.quali = NULL,
  ndim = 5,
  rename.level = FALSE,
  weight.col.quanti = NULL,
  weight.col.quali = NULL,
  graph = TRUE
)

Arguments

X.quanti

a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

X.quali

a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns).

ndim

number of dimensions kept in the results (by default 5).

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

weight.col.quanti

vector of weights for the quantitative variables.

weight.col.quali

vector of the weights for the qualitative variables.

graph

boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: component map of the individuals, plot of the squared loadings of all the variables (quantitative and qualitative), plot of the correlation circle (if quantitative variables are available), component map of the levels (if qualitative variables are available).

Details

If X.quali is not specified (i.e. NULL), only quantitative variables are available and standard PCA is performed. If X.quanti is NULL, only qualitative variables are available and standard MCA is performed.

Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.

PCAmix performs squared loadings in (sqload). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlations between the variable and the principal components.

Note that when all the p variables are qualitative, the factor coordinates (scores) of the n observations are equal to the factor coordinates (scores) of standard MCA times square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p. When all the variables are quantitative, PCAmix gives exactly the same results as standard PCA.

Value

eig

a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance.

ind

a list containing the results for the individuals (observations):

  • $coord: factor coordinates (scores) of the individuals,

  • $contrib: absolute contributions of the individuals,

  • $contrib.pct: relative contributions of the individuals,

  • $cos2: squared cosinus of the individuals.

quanti

a list containing the results for the quantitative variables:

  • $coord: factor coordinates (scores) of the quantitative variables,

  • $contrib: absolute contributions of the quantitative variables,

  • $contrib.pct: relative contributions of the quantitative variables (in percentage),

  • $cos2: squared cosinus of the quantitative variables.

levels

a list containing the results for the levels of the qualitative variables:

  • $coord: factor coordinates (scores) of the levels,

  • $contrib: absolute contributions of the levels,

  • $contrib.pct: relative contributions of the levels (in percentage),

  • $cos2: squared cosinus of the levels.

quali

a list containing the results for the qualitative variables:

  • $contrib: absolute contributions of the qualitative variables (sum of absolute contributions of the levels of the qualitative variable),

  • $contrib.pct: relative contributions (in percentage) of the qualitative variables (sum of relative contributions of the levels of the qualitative variable).

sqload

a matrix of dimension (p, ndim) containing the squared loadings of the quantitative and qualitative variables.

coef

the coefficients of the linear combinations used to construct the principal components of PCAmix, and to predict coordinates (scores) of new observations in the function predict.PCAmix.

M

the vector of the weights of the columns used in the Generalized Singular Value Decomposition.

Author(s)

Marie Chavent [email protected], Amaury Labenne.

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

print.PCAmix, summary.PCAmix, predict.PCAmix, plot.PCAmix

Examples

#PCAMIX:
data(wine)
str(wine)
X.quanti <- splitmix(wine)$X.quanti
X.quali <- splitmix(wine)$X.quali
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4)
pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE)
pca$eig
pca$ind$coord

#PCA:
data(decathlon)
quali<-decathlon[,13]
pca<-PCAmix(decathlon[,1:10])
pca<-PCAmix(decathlon[,1:10], graph=FALSE)
plot(pca,choice="ind",coloring.ind=quali,cex=0.8,
     posleg="topright",main="Scores")
plot(pca, choice="sqload",main="Squared correlations")
plot(pca, choice="cor",main="Correlation circle")
pca$quanti$coord

#MCA
data(flower)
mca <- PCAmix(X.quali=flower[,1:4], rename.level=TRUE, graph=FALSE)
plot(mca,choice="ind", main="Scores")
plot(mca,choice="sqload", main="Correlation ratios")
plot(mca,choice="levels", main="Levels")
mca$levels$coord

#Missing values
data(vnf)
PCAmix(X.quali=vnf,rename.level=TRUE)
vnf2<-na.omit(vnf)
PCAmix(X.quali=vnf2,rename.level=TRUE)

Varimax rotation in PCAmix

Description

Orthogonal rotation in PCAmix by maximization of the varimax function expressed in terms of PCAmix squared loadings (correlation ratios for qualitative variables and squared correlations for quantitative variables). PCArot includes the ordinary varimax rotation in Principal Component Analysis (PCA) and a varimax-type rotation in Multiple Correspondence Analysis (MCA) as special cases.

Usage

PCArot(obj, dim, itermax = 100, graph = TRUE)

Arguments

obj

an object of class PCAmix.

dim

number of rotated Principal Components.

itermax

maximum number of iterations in the Kaiser's practical optimization algorithm based on successive pairwise planar rotations.

graph

boolean, if TRUE the following graphs are displayed for the first two dimensions after rotation: plot of the individuals (factor coordinates), plot of the variables (squared loadings) plot of the correlation circle (if quantitative variables are available), plot of the levels component map (if qualitative variables are available).

Details

If X.quali is not specified (i.e. NULL) in the previous PCAmix step, only quantitative variables are available and standard varimax rotation in PCA is performed. If X.quanti is NULL, only qualitative variables are available and varimax-type rotation in MCA is performed. Note that p1 is the number of quantitative variables, p2 is the number of qualitative variables and m is the total number of levels of the p2 qualitative variables.

Value

eig

variances of the ndim dimensions after rotation.

ind$coord

a n by dim quantitative matrix which contains the coordinates (scores) of the n individuals on the dim rotated principal components.

quanti$coord

a p1 by dim quantitative matrix which contains the coordinates (loadings) of the p1 quantitative variables after rotation. The coordinates of the quantitative variables after rotation are correlations with the rotated principal components.

levels$coord

a m by dim quantitative matrix which contains the coordinates of the m levels on the dim rotated principal components.

quali$coord

a p2 by dim quantitative matrix which contains the coordinates of the p2 qualitative variables on the dim rotated principal components. Coordinates of the qualitative variables after rotation are correlation ratio with the rotated principal components.

coef

coefficients of the linear combinations used to construct the rotated principal components of PCAmix.

theta

angle of rotation if dim is equal to 2.

T

matrix of rotation.

Author(s)

Marie Chavent [email protected], Vanessa Kuentz, Benoit Liquet, Jerome Saracco

References

Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX. Advances in Classification and Data Analysis, Vol. 6, pp. 131-146.

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of qualitative and quantitative variables, Psychometrika, 56, 197-212.

See Also

plot.PCAmix, summary.PCAmix, PCAmix, predict.PCAmix

Examples

#PCAMIX:
data(wine)
pca<-PCAmix(X.quanti=wine[,c(3:29)],X.quali=wine[,1:2],ndim=4,graph=FALSE)
pca

rot<-PCArot(pca,3)
rot
rot$eig #percentages of variances after rotation

plot(rot,choice="ind",coloring.ind=wine[,1],
	    posleg="bottomleft", main="Rotated scores")
plot(rot,choice="sqload",main="Squared loadings after rotation")
plot(rot,choice="levels",main="Levels after rotation")
plot(rot,choice="cor",main="Correlation circle after rotation")



#PCA:
data(decathlon)
quali<-decathlon[,13]
pca<-PCAmix(decathlon[,1:10], graph=FALSE)

rot<-PCArot(pca,3)
plot(rot,choice="ind",coloring.ind=quali,cex=0.8,
	    posleg="topright",main="Scores after rotation")
plot(rot, choice="sqload", main="Squared correlations after rotation")
plot(rot, choice="cor", main="Correlation circle after rotation")

#MCA
data(flower)
mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE,graph=FALSE)

rot<-PCArot(mca,2)
plot(rot,choice="ind",main="Scores after rotation")
plot(rot, choice="sqload", main="Correlation ratios after rotation")
plot(rot, choice="levels", main="Levels after rotation")

Graphical outputs of MFAmix

Description

Displays the graphical outputs of MFAmix. Individuals (observations), quantitative variables and levels of the qualitative variables are plotted as points using their factor coordinates (scores) in MFAmix. All the variables (quantitative and qualitative) are plotted on the same graph as points using their squared loadings. The groups of variables are plotted using their contributions to the component coordinates. Partial axes and partial individuals of separated analyses can also be plotted.

Usage

## S3 method for class 'MFAmix'
plot(
  x,
  axes = c(1, 2),
  choice = "ind",
  label = TRUE,
  coloring.var = "not",
  coloring.ind = NULL,
  nb.partial.axes = 3,
  col.ind = NULL,
  col.groups = NULL,
  partial = NULL,
  lim.cos2.plot = 0,
  lim.contrib.plot = 0,
  xlim = NULL,
  ylim = NULL,
  cex = 1,
  main = NULL,
  leg = TRUE,
  posleg = "topleft",
  cex.leg = 0.8,
  col.groups.sup = NULL,
  posleg.sup = "topright",
  nb.paxes.sup = 3,
  ...
)

Arguments

x

an object of class MFAmix obtained with the function MFAmix.

axes

a length 2 vector specifying the components to plot.

choice

the graph to plot:

  • "ind" for the individuals,

  • "cor" for the correlation circle of the quantitative variables,

  • "levels" for the levels of of the qualitative variables,

  • "sqload" for the plot of the squared loadings of all the variables,

  • "groups" for the plot of the contributions of the groups,

  • "axes" for the correlation circle of the partial axes.

label

boolean, if FALSE the labels of the points are not plotted.

coloring.var

a value to choose among:

  • "type": the variables in the plot of the squared loadings are colored according to their type (quantitative or qualitative),

  • "groups": the variables are colored according to their group.

  • NULL: variables are not colored.

coloring.ind

a qualitative variable such as a character vector or a factor of size n (the number of individuals). The individuals are colored according to the levels of this variable. If NULL, the individuals are not colored.

nb.partial.axes

f choice="axes", the maximum number of partial axes related to each group to plot on the correlation circle. By default equal to 3.

col.ind

a vector of colors, of size the number of levels of coloring.ind. If NULL, colors are chosen automatically.

col.groups

a vector of colors, of size the number of groups. If NULL, colors are chosen automatically.

partial

a vector of class character with the row names of the individuals, for which the partial individuals should be drawn. By default partial = NULL and no partial points are drawn. Partial points are colored according to col.groups

lim.cos2.plot

a value between 0 and 1. Points with squared cosinus below this value are not plotted.

lim.contrib.plot

a value between 0 and 100. Points with relative contributions (in percentage) below this value are not plotted.

xlim

a numeric vectors of length 2, giving the x coordinates range. If NULL (by default) the range is defined automatically (recommended).

ylim

a numeric vectors of length 2, giving the y coordinates range. If NULL (by default) the range is defined automatically (recommended).

cex

cf. function par in the graphics package

main

a string corresponding to the title of the graph to draw.

leg

boolean, if TRUE, a legend is displayed..

posleg

position of the legend.

cex.leg

a numerical value giving the amount by which the legend should be magnified. Default is 0.8.

col.groups.sup

a vector of colors, of size the number of supplementary groups. If NULL, colors are chosen automatically.

posleg.sup

position of the legend for the supplementary groups.

nb.paxes.sup

if choice="axes", the maximum number of partial axes of supplementary groups ploted on the correlation circle. By default equal to 3.

...

arguments to be passed to methods, such as graphical parameters.

Details

The observations can be colored according to the levels of a qualitative variable. The observations, the quantitative variables and the levels can be selected according to their squared cosine (lim.cos2.plot) or their relative contribution (lim.contrib.plot) to the component map. Only points with squared cosine or relative contribution greater than a given threshold are plotted. Note that the relative contribution of a point to the component map (a plan) is the sum of the absolute contributions to each dimension, divided by the sum of the corresponding eigenvalues.

Author(s)

, [email protected], Amaury Labenne

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

summary.PCAmix,PCAmix,PCArot

Examples

data(gironde)
class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4))
names <- c("employment","housing","services","environment")
dat <- cbind(gironde$employment[1:20,],gironde$housing[1:20,],
           gironde$services[1:20,],gironde$environment[1:20,])
res <- MFAmix(data=dat,groups=class.var,
            name.groups=names, rename.level=TRUE, ndim=3,graph=FALSE)

#---- quantitative variables
plot(res,choice="cor",cex=0.6)
plot(res,choice="cor",cex=0.6,coloring.var="groups")
plot(res,choice="cor",cex=0.6,coloring.var="groups",
     col.groups=c("red","yellow","pink","brown"),leg=TRUE)

#----partial axes
plot(res,choice="axes",cex=0.6)
plot(res,choice="axes",cex=0.6,coloring.var="groups")
plot(res,choice="axes",cex=0.6,coloring.var="groups",
     col.groups=c("red","yellow","pink","brown"),leg=TRUE)

#----groups
plot(res,choice="groups",cex=0.6)   #no colors for groups
plot(res,choice="groups",cex=0.6,coloring.var="groups") 
plot(res,choice="groups",cex=0.6,coloring.var="groups",
     col.groups=c("red","yellow","pink","blue")) 
#----squared loadings
plot(res,choice="sqload",cex=0.8)    #no colors for groups
plot(res,choice="sqload",cex=0.8,coloring.var="groups",
     posleg="topright") 
plot(res,choice="sqload",cex=0.6,coloring.var="groups",
     col.groups=c("red","yellow","pink","blue"),ylim=c(0,1)) 
plot(res,choice="sqload",cex=0.8,coloring.var="type",
     cex.leg=0.9,posleg="topright")  

#----individuals 
plot(res,choice="ind",cex=0.6) 

#----individuals with squared cosine greater than 0.5
plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.5)  

#----individuals colored with a qualitative variable
nbchem <- gironde$services$chemist[1:20]
plot(res,choice="ind",cex=0.6,coloring.ind=nbchem,
     posleg="topright")   
plot(res,choice="ind",coloring.ind=nbchem,
     col.ind=c("pink","brown","darkblue"),label=FALSE,posleg="topright")     

#----partial individuals colored by groups
plot(res,choice="ind",partial=c("AUBIAC","ARCACHON"),
    cex=0.6,posleg="bottomright")

#----levels of qualitative variables
plot(res,choice="levels",cex=0.8)
plot(res,choice="levels",cex=0.8,coloring.var="groups")

#levels with squared cosine greater than 0.6
plot(res,choice="levels",cex=0.8, lim.cos2.plot=0.6)

#supplementary groups
data(wine)
X.quanti <- splitmix(wine)$X.quanti[,1:5]
X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE]
X.quanti.sup <- splitmix(wine)$X.quanti[,28:29]
X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE]
data <- cbind(X.quanti,X.quali)
data.sup <- cbind(X.quanti.sup,X.quali.sup)

groups <-c(1,2,2,3,3,1)
name.groups <- c("G1","G2","G3")
groups.sup <- c(1,1,2)
name.groups.sup <- c("Gsup1","Gsup2")
mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE)
mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE)
plot(mfa.sup,choice="sqload",coloring.var="groups")
plot(mfa.sup,choice="axes",coloring.var="groups")
plot(mfa.sup,choice="groups",coloring.var="groups")
plot(mfa.sup,choice="levels",coloring.var="groups")
plot(mfa.sup,choice="levels")
plot(mfa.sup,choice="cor",coloring.var = "groups")

Graphical outputs of PCAmix and PCArot

Description

Displays the graphical outputs of PCAmix and PCArot. The individuals (observations), the quantitative variables and the levels of the qualitative variables are plotted as points using their factor coordinates (scores). All the variables (quantitative and qualitative) are plotted as points on the same graph using their squared loadings.

Usage

## S3 method for class 'PCAmix'
plot(
  x,
  axes = c(1, 2),
  choice = "ind",
  label = TRUE,
  coloring.ind = NULL,
  col.ind = NULL,
  coloring.var = FALSE,
  lim.cos2.plot = 0,
  lim.contrib.plot = 0,
  posleg = "topleft",
  xlim = NULL,
  ylim = NULL,
  cex = 1,
  leg = TRUE,
  main = NULL,
  cex.leg = 1,
  ...
)

Arguments

x

an object of class PCAmix obtained with the function PCAmix or PCArot.

axes

a length 2 vector specifying the components to plot.

choice

the graph to plot:

  • "ind" for the individuals component map,

  • "cor" for the correlation circle if quantitative variables are available in the data,

  • "levels" for the levels components map (if qualitative variables are available in the data),

  • "sqload" for the plot of the squared loadings of all the variables.

label

boolean, if FALSE the labels of the points are not plotted.

coloring.ind

a qualitative variable such as a character vector or a factor of size n (the number of individuals). The individuals are colored according to the levels of this variable. If NULL, the individuals are not colored.

col.ind

a vector of colors, of size the number of levels of coloring.ind. If NULL, colors are chosen automatically.

coloring.var

boolean, if TRUE, the variables in the plot of the squared loadings are colored according to their type (quantitative or qualitative).

lim.cos2.plot

a value between 0 and 1. Points with squared cosinus below this value are not plotted.

lim.contrib.plot

a value between 0 and 100. Points with relative contributions (in percentage) below this value are not plotted.

posleg

position of the legend.

xlim

a numeric vectors of length 2, giving the x coordinates range. If NULL (by default) the range is defined automatically (recommended).

ylim

a numeric vectors of length 2, giving the y coordinates range. If NULL (by default) the range is defined automatically (recommended).

cex

cf. function par in the graphics package

leg

boolean, if TRUE, a legend is displayed.

main

a string corresponding to the title of the graph to draw.

cex.leg

a numerical value giving the amount by which the legend should be magnified. Default is 0.8.

...

arguments to be passed to methods, such as graphical parameters.

Details

The observations can be colored according to the levels of a qualitative variable. The observations, the quantitative variables and the levels can be selected according to their squared cosine (lim.cos2.plot) or their relative contribution (lim.contrib.plot) to the component map. Only points with squared cosine or relative contribution greater than a given threshold are plotted. Note that the relative contribution of a point to the component map (a plan) is the sum of the absolute contributions to each dimension, divided by the sum of the corresponding eigenvalues.

Author(s)

Marie Chavent [email protected], Amaury Labenne

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

summary.PCAmix,PCAmix,PCArot

Examples

data(gironde)
base <- gironde$housing[1:20,]
X.quanti <-splitmix(base)$X.quanti
X.quali <- splitmix(base)$X.quali
res<-PCAmix(X.quanti, X.quali, rename.level=TRUE, ndim=3,graph=FALSE)

#----quantitative variables on the correlation circle
plot(res,choice="cor",cex=0.8)

#----individuals component map
plot(res,choice="ind",cex=0.8)

#----individuals colored with the qualitative variable "houses"
houses <- X.quali$houses
plot(res,choice="ind",cex=0.6,coloring.ind=houses) 

#----individuals selected according to their cos2
plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.8)
#----all the variables plotted with the squared loadings
plot(res,choice="sqload",cex=0.8)

#----variables colored according to their type (quanti or quali)
plot(res,choice="sqload",cex=0.8,coloring.var=TRUE) 

#----levels component map
plot(res,choice="levels",cex=0.8)

#----example with supplementary variables
data(wine)
X.quanti <- splitmix(wine)$X.quanti[,1:5]
X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE]
X.quanti.sup <-splitmix(wine)$X.quanti[,28:29]
X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE]
pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE)
pca2 <- supvar(pca,X.quanti.sup,X.quali.sup)
plot(pca2,choice="levels")
plot(pca2,choice="cor")
plot(pca2,choice="sqload")

Prediction of new scores in MFAmix

Description

This function performs the scores of new observations on the principal components of MFAmix. In other words, this function is projecting the new observations onto the principal components of MFAmix obtained previoulsy on a separated dataset. Note that the new observations must be described with the same variables than those used in MFAmix. The groups of variables must also be identical.

Usage

## S3 method for class 'MFAmix'
predict(object, data, ...)

Arguments

object

an object of class MFAmix obtained with the function MFAmix.

data

a data frame containing the description of the new observations on all the variables. This data frame will be split into G groups according to the vector groups.

...

urther arguments passed to or from other methods. They are ignored in this function.

Value

Returns the matrix of the scores of the new observations on the principal components or on the rotated principal components of MFAmix.

Author(s)

Marie Chavent [email protected], Amaury Labenne.

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

MFAmix

Examples

data(gironde)
class.var<-c(rep(1,9),rep(2,5),rep(3,9),rep(4,4))
names<-c("employment","housing","services","environment")
dat<-cbind(gironde$employment,gironde$housing,
           gironde$services,gironde$environment)
n <- nrow(dat)
set.seed(10)
sub <- sample(1:n,520)

res<-MFAmix(data=dat[sub,],groups=class.var,
            name.groups=names, rename.level=TRUE, 
            ndim=3,graph=FALSE)

#Predict scores of new data
pred<-predict(res,data=dat[-sub,])
plot(res,choice="ind",cex=0.6,lim.cos2.plot=0.7)  
points(pred[1:5,c(1,2)],col=2,pch=16,cex=0.6)
text(pred[1:5,c(1,2)], labels = rownames(dat[-sub,])[1:5],
     col=2,pos=3,cex=0.6)

Prediction of new scores in PCAmix or PCArot

Description

This function performs the scores of new observations on the principal components of PCAmix. If the components have been rotated, this function performs the scores of the new observations on the rotated principal components. In other words, this function is projecting the new observations onto the principal components of PCAmix (or PCArot) obtained previoulsy on a separated dataset. Note that the new observations must be described with the same variables than those used in PCAmix (or PCArot).

Usage

## S3 method for class 'PCAmix'
predict(object, X.quanti = NULL, X.quali = NULL, ...)

Arguments

object

an object of class PCAmix obtained with the function PCAmix or PCArot.

X.quanti

a numeric data matrix or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

X.quali

a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns).

...

urther arguments passed to or from other methods. They are ignored in this function.

Value

Returns the matrix of the scores of the new observations on the principal components or on the rotated principal components of PCAmix.

Author(s)

Marie Chavent [email protected], Amaury Labenne.

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

PCAmix,PCArot

Examples

# quantitative data
data(decathlon)
n <- nrow(decathlon)
train <- sample(1:n,20)
pca <- PCAmix(decathlon[train,1:10], graph=FALSE)
predict(pca, decathlon[-train,1:10])
rot <- PCArot(pca,dim=4)
predict(rot,decathlon[-train,1:10])

# qualitative data
data(flower)
n <- nrow(flower)
train <- sample(1:n,10)
mca <- PCAmix(X.quali=flower[train,1:3], rename.level=TRUE, graph=FALSE)
predict(mca, X.quali=flower[-train,1:3])

# quantitative and qualitative data
data(wine)
X.quanti <- splitmix(wine)$X.quanti
X.quali <- splitmix(wine)$X.quali
n <- nrow(wine)
train <- sample(1:n, 10)
pca <-PCAmix(X.quanti[train,1:10], X.quali[train,], ndim=4)
pred <- predict(pca, X.quanti[-train,1:10], X.quali[-train,])
plot(pca,axes=c(1,2))
points(pred[,c(1,2)],col=2,pch=16)
text(pred[,c(1,2)], labels = rownames(X.quanti[-train,1:27]), col=2,pos=3)

Print a 'MFAmix' object

Description

This is a method for the function print for objects of the class MFAmix.

Usage

## S3 method for class 'MFAmix'
print(x, ...)

Arguments

x

an object of class MFAmix generated by the function PCAmix.

...

further arguments to be passed to or from other methods. They are ignored in this function.

See Also

MFAmix


Print a 'PCAmix' object

Description

This is a method for the function print for objects of the class PCAmix.

Usage

## S3 method for class 'PCAmix'
print(x, ...)

Arguments

x

an object of class PCAmix generated by the functions PCAmix and PCArot.

...

further arguments to be passed to or from other methods. They are ignored in this function.

See Also

PCAmix , PCArot


Protein data

Description

The data measure the amount of protein consumed for nine food groups in 25 European countries. The nine food groups are red meat (RedMeat), white meat (WhiteMeat), eggs (Eggs), milk (Milk), fish (Fish), cereal (Cereal), starch (Starch), nuts (Nuts), and fruits and vegetables (FruitVeg).

Format

A data frame with 25 rows (the European countries) and 9 columns (the food groups)

Source

Originated by A. Weber and cited in Hand et al., A Handbook of Small Data Sets, (1994, p. 297).


Recoding of the data matrices

Description

Recoding of the quantitative and of the qualitative data matrix.

Usage

recod(X.quanti, X.quali,rename.level=FALSE)

Arguments

X.quanti

a numerical data matrix.

X.quali

a categorical data matrix.

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name".

Value

X

X.quanti and X.quali concatenated in a single matrix.

Y

X.quanti with missing values replaced with mean values concatenated with the indicator matrix of X.quali with missing values replaced by zeros.

Z

X.quanti standardized (centered and reduced by standard deviations) concatenated with the indicator matrix of X.quali centered and reduced with the square roots of the relative frequencies of the categories.

W

X.quanti standardized (centered and reduced by standard deviations) concatenated with the indicator matrix of X.quali centered.

n

the number of observations.

p

the total number of variables

p1

the number of quantitative variables

p2

the number of qualitative variables

g

the means of the columns of Y

s

the standard deviations of the columns of Y

G

The indicator matix of X.quali with missing values replaced by 0

Gcod

The indicator matix G reduced with the square roots of the relative frequencies of the categories


Recoding of the qualitative data matrix.

Description

Recoding of the qualitative data matrix.

Usage

recodqual(X,rename.level=FALSE)

Arguments

X

the qualitative data matrix.

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name".

Value

G

The indicator matix of X with missing values replaced by 0.

Examples

data(vnf)
X <- vnf[1:10,9:12]
tab.disjonctif.NA(X)
recodqual(X)

Recoding of the quantitative data matrix

Description

Recoding of the quantitative data matrix.

Usage

recodquant(X)

Arguments

X

the quantitative data matrix.

Value

Z

the standardized quantitative data matrix (centered and reduced with the standard deviations.)

g

the means of the columns of X

s

the standard deviations of the columns of X (population version with 1/n)

Xcod

The quantitative matrix X with missing values replaced with the column mean values.

Examples

data(decathlon)
X <- decathlon[1:5,1:5]
X[1,2] <- NA
X[2,3] <-NA
rec <- recodquant(X)

splitgroups

Description

If the p variables of a data matrix of dimension (n,p) are separated into G groups, this functions splits this data matrix into G datasets according the groups membership.

Usage

splitgroups(data, groups, name.groups)

Arguments

data

the a data matrix into G datasets with n rows and p columns.

groups

a vector of size p whose values indicate at which group belongs each variable.

name.groups

a vector of size G which contains names for each group we want to create. The names are given in the increasing order of the numbers of the groups.

Value

data.groups

a list of G data matrix: one matrix for each group.

listvar.groups

The list of the variables in each group.

Examples

data(decathlon)
split <- splitgroups(decathlon,groups=c(rep(1,10),2,2,3),
          name.groups=c("Epreuve","Classement","Competition"))
split$data.groups$Epreuve

splitmix

Description

Splits a mixed data matrix in two data sets: one with the quantitative variables and one with the qualitative variables. Here, the columns of class "integer are considered quantitative. If you want this column to be considered as qualitative, it must be of class character of factor.

Usage

splitmix(data)

Arguments

data

a data matrix or a data.frame with a mixture of quantitative and qualitative variables.

Value

X.quanti

a data matrix containing only the quantitative variables.

X.quali

A data.frame containing only the qualitative variables.

Examples

data(decathlon)
data.split <- splitmix(decathlon)
data.split$X.quanti
data.split$X.quali

Summary of a 'MFAmix' object

Description

This is a method for the function summary for objects of the class MFAmix.

Usage

## S3 method for class 'MFAmix'
summary(object, ...)

Arguments

object

an object of class MFAmix obtained with the function MFAmix.

...

further arguments passed to or from other methods.

Value

Returns the total number of observations, the number of quantitative variables, the number of qualitative variables with the total number of levels. And all those values are also given by groups.

See Also

plot.MFAmix,MFAmix


Summary of a 'PCAmix' object

Description

This is a method for the function summary for objects of the class PCAmix.

Usage

## S3 method for class 'PCAmix'
summary(object, ...)

Arguments

object

an object of class PCAmix obtained with the function PCAmix or PCArot.

...

further arguments passed to or from other methods.

Value

Returns the matrix of squared loadings. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the scores or with the rotated (standardized) scores.

See Also

plot.PCAmix,PCAmix,PCArot,

Examples

data(wine)
X.quanti <- wine[,c(3:29)] 
X.quali <- wine[,c(1,2)] 
pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE)
summary(pca)

rot<-PCArot(pca,3,graph=FALSE)
summary(rot)

Supplementary variables projection

Description

supvar is a generic function for adding supplementary variables in PCAmix or MFAmix. The function invokes invokes two methods which depend on the class of the first argument.

Usage

supvar(obj, ...)

Arguments

obj

an object of class PCAmix or MFAmix.

...

further arguments passed to or from other methods.

Details

This generic function has two methods supvar.PCAmix and supvar.MFAmix


Supplementary variables in MFAmix

Description

Performs the coordinates of supplementary variables and groups on the component of an object of class MFAmix.

Usage

## S3 method for class 'MFAmix'
supvar(obj, data.sup, groups.sup, name.groups.sup, rename.level = FALSE, ...)

Arguments

obj

an object of class MFAmix.

data.sup

a numeric matrix of data.

groups.sup

a vector which gives the groups of the columns in data.sup.

name.groups.sup

a vector which gives the names of the supplementary groups.

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

...

further arguments passed to or from other methods.

Examples

data(wine)
X.quanti <- splitmix(wine)$X.quanti[,1:5]
X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE]
X.quanti.sup <- splitmix(wine)$X.quanti[,28:29]
X.quali.sup <- splitmix(wine)$X.quali[,2,drop=FALSE]
data <- cbind(X.quanti,X.quali)
data.sup <- cbind(X.quanti.sup,X.quali.sup)
groups <-c(1,2,2,3,3,1)
name.groups <- c("G1","G2","G3")
groups.sup <- c(1,1,2)
name.groups.sup <- c("Gsup1","Gsup2")
mfa <- MFAmix(data,groups,name.groups,ndim=4,rename.level=TRUE,graph=FALSE)
mfa.sup <- supvar(mfa,data.sup,groups.sup,name.groups.sup,rename.level=TRUE)

Supplementary variables in PCAmix

Description

Performs the coordinates of supplementary variables on the component of an object of class PCAmix.

Usage

## S3 method for class 'PCAmix'
supvar(obj, X.quanti.sup = NULL, X.quali.sup = NULL, rename.level = FALSE, ...)

Arguments

obj

an object of class PCAmix.

X.quanti.sup

a numeric matrix of data.

X.quali.sup

a categorical matrix of data.

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.

...

further arguments passed to or from other methods.

See Also

PCAmix

Examples

data(wine)
X.quanti <- splitmix(wine)$X.quanti[,1:5]
X.quali <- splitmix(wine)$X.quali[,1,drop=FALSE]
X.quanti.sup <-splitmix(wine)$X.quanti[,28:29]
X.quali.sup <-splitmix(wine)$X.quali[,2,drop=FALSE]
pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE)
pcasup <- supvar(pca,X.quanti.sup,X.quali.sup)

Built an indicator matrix

Description

This function built the indicator matrix of a qualitative data matrix. Missing observations are indicated as NAs.

Usage

tab.disjonctif.NA(tab, rename.level = FALSE)

Arguments

tab

a categorical data matrix..

rename.level

boolean, if TRUE all the levels of the qualitative variables are renamed as follows: variable_name=level_name.

Details

This function uses the code of the function tab.disjonctif implemented in the package FactoMineR but is different. Here, a NA value appears when a category has not been observed in a row. In the function tab.disjonctif of the package FactoMineR, a new column is created in that case.

Value

Returns the indicator matrix with NA for missing observations.

Examples

data(vnf)
X <- vnf[1:10,9:12]
tab.disjonctif.NA(X)

User satisfaction survey with 1232 individuals and 14 questions

Description

A user satisfaction survey of pleasure craft operators on the “Canal des Deux Mers”, located in South of France, was carried out by the public corporation “Voies Navigables de France” (VNF) responsible for managing and developing the largest network of navigable waterways in Europe

Usage

data(vnf)

Format

A data frame with 1232 observations and 14 qualitative variables.

Source

Josse, J., Chavent, M., Liquet, B. and Husson, F. (2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification, Vol. 29, pp. 91-116.


Wine

Description

The data used here refer to 21 wines of Val de Loire.

Usage

data(wine)

Format

A data frame with 21 rows (the number of wines) and 31 columns: the first column corresponds to the label of origin, the second column corresponds to the soil, and the others correspond to sensory descriptors.

Source

Centre de recherche INRA d'Angers

Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.