Title: | Clustering of Variables |
---|---|
Description: | Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both. |
Authors: | Marie Chavent [aut, cre], Vanessa Kuentz [aut], Benoit Liquet [aut], Jerome Saracco [aut] |
Maintainer: | Marie Chavent <[email protected]> |
License: | GPL (>=2.0) |
Version: | 1.1 |
Built: | 2025-02-03 02:37:43 UTC |
Source: | https://github.com/chavent/clustofvar |
Draw a bootstrap sample from X.quanti and a bootstrap sample from X.quali
bootvar(X.quanti = NULL, X.quali = NULL)
bootvar(X.quanti = NULL, X.quali = NULL)
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
Calculates the measure of aggregation of two clusters of variables. This measure of aggregation is equal to the decrease in homogeneity for the clusters being merged.
clust_diss(A, B)
clust_diss(A, B)
A |
a centered and reduced data matrix obtained with the recod function for the first cluster |
B |
a centered and reduced data matrix obtained with the recod function for the second cluster |
The aggregation measure between the two clusters
data(decathlon) A <- PCAmixdata::recod(X.quanti=decathlon[1:10,1:5], X.quali=NULL)$Z B <- PCAmixdata::recod(X.quanti=decathlon[1:10,6:10], X.quali=NULL)$Z clust_diss(A,B)
data(decathlon) A <- PCAmixdata::recod(X.quanti=decathlon[1:10,1:5], X.quali=NULL)$Z B <- PCAmixdata::recod(X.quanti=decathlon[1:10,6:10], X.quali=NULL)$Z clust_diss(A,B)
Dissimilarity between two clusters of variables when only the covariance/correlation matrix is known.
clust_diss2(x, A, B)
clust_diss2(x, A, B)
x |
a covariance/correlation matrix |
A |
indices of cluster A |
B |
indices of cluter B |
The dissimilarity between the two clusters
data(decathlon) x <- cor(decathlon[,1:10]) A <- c(1,3,4) B <- c(2,7,10) clust_diss2(x,A,B)
data(decathlon) x <- cor(decathlon[,1:10]) A <- c(1,3,4) B <- c(2,7,10) clust_diss2(x,A,B)
Calculates the synthetic variable of a cluster of variables. The variables can be quantitative or qualitative. The synthetic variable is the first principal component of PCAmix. The variance of the synthetic variable is the first eigenvalue. It is equal to the sum of squared correlations or correlation ratios to the synthetic variable. It measures the homogeneity of the cluster.
clusterscore(Z)
clusterscore(Z)
Z |
a centered and reduced data matrix obtained with the recod function |
f |
the synthetic variables i.e. the scores on the first principal component of PCAmix |
sv |
the standard deviation of f i.e. the first singular value |
v |
the standardized loadings |
data(decathlon) A <- 1:5 Z <- PCAmixdata::recod(X.quanti=decathlon[1:10,A], X.quali=NULL)$Z clusterscore(Z) Z%*%as.matrix(clusterscore(Z)$v) clusterscore(Z)$f
data(decathlon) A <- 1:5 Z <- PCAmixdata::recod(X.quanti=decathlon[1:10,A], X.quali=NULL)$Z clusterscore(Z) Z%*%as.matrix(clusterscore(Z)$v) clusterscore(Z)$f
Cuts a hierarchical tree of variables resulting from hclustvar
into
several clusters by specifying the desired number of clusters.
cutreevar(obj, k = NULL, matsim = FALSE)
cutreevar(obj, k = NULL, matsim = FALSE)
obj |
an object of class 'hclustvar'. |
k |
an integer scalar with the desired number of clusters. |
matsim |
boolean, if TRUE, the matrices of similarities between variables in same cluster are calculated. |
var |
a list of matrices of squared loadings i.e. for each cluster of variables, the squared loadings on first principal component of PCAmix. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the first PC (the cluster center). |
sim |
a list of matrices of similarities
i.e. for each cluster, similarities between their variables. The
similarity between two variables is defined as a square cosine: the square
of the Pearson correlation when the two variables are quantitative; the
correlation ratio when one variable is quantitative and the other one is
qualitative; the square of the canonical correlation between two sets of
dummy variables, when the two variables are qualitative. |
cluster |
a vector of integers indicating the cluster to which each variable is allocated. |
wss |
the within-cluster sum of squares for each cluster: the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster. |
E |
the pourcentage of homogeneity which is accounted by the partition in k clusters. |
size |
the number of variables in each cluster. |
scores |
a n by k numerical matrix which contains the k
cluster centers. The center of a cluster is a synthetic variable: the first
principal component calculated by PCAmix. The k columns of |
coef |
a list of the coefficients of the linear combinations defining the synthetic variable of each cluster. |
data(decathlon) tree <- hclustvar(decathlon[,1:10]) plot(tree) #choice of the number of clusters stability(tree,B=40) part <- cutreevar(tree,4) print(part) summary(part)
data(decathlon) tree <- hclustvar(decathlon[,1:10]) plot(tree) #choice of the number of clusters stability(tree,B=40) part <- cutreevar(tree,4) print(part) summary(part)
The data used here refer to athletes' performance during two sporting events.
data(decathlon)
data(decathlon)
A data frame with 41 rows and 13 columns: the first ten columns corresponds to the performance of the athletes for the 10 events of the decathlon. The columns 11 and 12 correspond respectively to the rank and the points obtained. The last column is a categorical variable corresponding to the sporting event (2004 Olympic Game or 2004 Decastar)
The references below.
Departement of Applied Mathematics, Agrocampus Rennes.
Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.
Data refering to 27 breeds of dogs.
A data frame with 27 rows (the breeds of dogs) and 7 columns: their size, weight and speed with 3 categories (small, medium, large), their intelligence (low, medium, high), their affectivity and aggressiveness with 3 categories (low, high), their function (utility, compagny, hunting).
Originated by A. Brefort (1982) and cited in Saporta G. (2011).
8 characteristics for 18 popular flowers.
data(flower)
data(flower)
A data frame with 18 observations on 8 variables:
[ , "V1"] | factor | winters |
[ , "V2"] | factor | shadow |
[ , "V3"] | factor | tubers |
[ , "V4"] | factor | color |
[ , "V5"] | ordered | soil |
[ , "V6"] | ordered | preference |
[ , "V7"] | numeric | height |
[ , "V8"] | numeric | distance |
winters, is binary and indicates whether the plant may be left in the garden when it freezes.
shadow, is binary and shows whether the plant needs to stand in the shadow.
tubers, is asymmetric binary and distinguishes between plants with tubers and plants that grow in any other way.
color, is nominal and specifies the flower's color (1 = white, 2 = yellow, 3 = pink, 4 = red, 5 = blue).
soil, is ordinal and indicates whether the plant grows in dry (1), normal (2), or wet (3) soil.
preference, is ordinal and gives someone's preference ranking going from 1 to 18.
height, is interval scaled, the plant's height in centimeters.
distance, is interval scaled, the distance in centimeters that should be left between the plants.
The reference below.
Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in an Object-Oriented Environment. Journal of Statistical Software, 1. http://www.stat.ucla.edu/journals/jss/
Nearest neighbor of variables
getnnsvar(diss, flag)
getnnsvar(diss, flag)
diss |
a dissimilarity matrix between variables |
flag |
a vector of size |
A list of 4 datasets caracterizing conditions of life of 542 cities in Gironde. The four datasets correspond to four thematics relative to conditions of life. Each dataset contains a different number of variables (quantitative and/or qualitative). The first three datasets come from the 2009 population census realized in Gironde by INSEE (Institut National de la Statistique et des Etudes Economiques). The fourth come from an IGN (Institut National de l'Information Geographique et forestiere) database.
data(gironde)
data(gironde)
A list of 4 data frames.
gironde$employment |
This data frame contains the description of 542 cities by 9 quantitative variables. These variables are related to employment conditions like, for instance, the average income (income), the percentage of farmers (farmer). |
gironde$housing |
This data frame contains the description of 542 cities by 5 variables (2 qualitative variables and 3 quantitative variables). These variables are related to housing conditions like, for instance, the population density (density), the percentage of counsil housing within the cities (council). |
gironde$services |
This data frame contains the description of 542 cities by 9 qualitative variables. These variables are related to the number of services within the cities, like, for instance, the number of bakeries (baker) or the number of post office (postoffice). |
gironde$environment |
This data frame contains the description of 542 cities by 4 quantitative variables. These variables are related to the natural environment of the cities, like, for instance the percentage of agricultural land (agricul) or the percentage of buildings (building). |
www.INSEE.fr
www.ign.fr
http://siddt.grenoble.cemagref.fr/
Multivariate analysis of mixed data: The PCAmixdata R package, M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco, arXiv:1411.4911 [stat.CO]
Ascendant hierarchical clustering of a set of variables. Variables can be quantitative, qualitative or a mixture of both. The aggregation criterion is the decrease in homogeneity for the clusters being merged. The homogeneity of a cluster is the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster which is the first principal component of PCAmix. PCAmix is defined for a mixture of qualitative and quantitative variables and includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.
hclustvar(X.quanti = NULL, X.quali = NULL, init = NULL)
hclustvar(X.quanti = NULL, X.quali = NULL, init = NULL)
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
init |
an initial partition (a vector of integers indicating the cluster to which each variable is allocated). |
If the quantitative and qualitative data are in a same dataframe, the function
PCAmixdata::splitmix
can be used to extract automatically the qualitative and the quantitative
data in two separated dataframes.
height |
a set of p-1 non-decreasing real values: the values of the aggregation criterion. |
clusmat |
a p by p matrix with group memberships where each column k corresponds to the elements of the partition in k clusters. |
merge |
a p-1 by 2 matrix. Row i of |
Chavent, M., Liquet, B., Kuentz, V., Saracco, J. (2012), ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software, Vol. 50, pp. 1-16.
cutreevar
, plot.hclustvar
,
stability
#quantitative variables data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10], init=NULL) plot(tree) #qualitative variables with missing values data(vnf) tree_NA <- hclustvar(X.quali=vnf) plot(tree_NA) vnf2<-na.omit(vnf) tree <- hclustvar(X.quali=vnf2) plot(tree) #mixture of quantitative and qualitative variables data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) plot(tree)
#quantitative variables data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10], init=NULL) plot(tree) #qualitative variables with missing values data(vnf) tree_NA <- hclustvar(X.quali=vnf) plot(tree_NA) vnf2<-na.omit(vnf) tree <- hclustvar(X.quali=vnf2) plot(tree) #mixture of quantitative and qualitative variables data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) plot(tree)
Ascendant hierarchical clustering of a set of variables from a covariance/correlation matrix.
hclustvar2(x, init = NULL)
hclustvar2(x, init = NULL)
x |
a covariance or correlation matrix. |
init |
an initial partition (a vector of integers indicating the cluster to which each variable is allocated). |
height |
a set of p-1 non-decreasing real values: the values of the aggregation criterion. |
clusmat |
a p by p matrix with group memberships where each column k corresponds to the elements of the partition in k clusters. |
merge |
a p-1 by 2 matrix. Row i of |
cutreevar
, plot.hclustvar
,
stability
data(decathlon) x <- cor(decathlon[,1:10]) tree <- hclustvar2(x, init=NULL) plot(tree, hang = -1, xlab="", sub="")
data(decathlon) x <- cor(decathlon[,1:10]) tree <- hclustvar2(x, init=NULL) plot(tree, hang = -1, xlab="", sub="")
Iterative relocation algorithm of k-means type which performs a partitionning of a set of variables. Variables can be quantitative, qualitative or a mixture of both. The center of a cluster of variables is a synthetic variable but is not a 'mean' as for classical k-means. This synthetic variable is the first principal component calculated by PCAmix. PCAmix is defined for a mixture of qualitative and quantitative variables and includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. The homogeneity of a cluster of variables is defined as the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster, which is in all cases a numerical variable. Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.
kmeansvar(X.quanti = NULL, X.quali = NULL, init, iter.max = 150, nstart = 1, matsim = FALSE)
kmeansvar(X.quanti = NULL, X.quali = NULL, init, iter.max = 150, nstart = 1, matsim = FALSE)
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
init |
either the number of clusters or an initial partition (a vector
of integers indicating the cluster to which each variable is allocated).
If |
iter.max |
the maximum number of iterations allowed. |
nstart |
if |
matsim |
boolean, if 'TRUE', the matrices of similarities between variables in same cluster are calculated. |
If the quantitative and qualitative data are in a same dataframe, the function
splitmix
can be used to extract automatically the qualitative and the quantitative
data in two separated dataframes.
var |
a list of matrices of squared loadings i.e. for each cluster of variables, the squared loadings on first principal component of PCAmix. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the first PC (the cluster center). |
sim |
a list of matrices of similarities
i.e. for each cluster, similarities between their variables. The
similarity between two variables is defined as a square cosine: the square
of the Pearson correlation when the two variables are quantitative; the
correlation ratio when one variable is quantitative and the other one is
qualitative; the square of the canonical correlation between two sets of
dummy variables, when the two variables are qualitative. |
cluster |
a vector of integers indicating the cluster to which each variable is allocated. |
wss |
the within-cluster sum of squares for each cluster: the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster. |
E |
the pourcentage of homogeneity which is accounted by the partition in k clusters. |
size |
the number of variables in each cluster. |
scores |
a n by k numerical matrix which contains the k
cluster centers. The center of a cluster is a synthetic variable: the first
principal component calculated by PCAmix. The k columns of |
coef |
a list of the coefficients of the linear combinations defining the synthetic variable of each cluster. |
Chavent, M., Liquet, B., Kuentz, V., Saracco, J. (2012), ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software, Vol. 50, pp. 1-16.
splitmix
, summary.clustvar
,predict.clustvar
data(decathlon) #choice of the number of clusters tree <- hclustvar(X.quanti=decathlon[,1:10]) stab <- stability(tree,B=60) #a random set of variables is chosen as the initial cluster centers, nstart=10 times part1 <- kmeansvar(X.quanti=decathlon[,1:10],init=5,nstart=10) summary(part1) #the partition from the hierarchical clustering is chosen as initial partition part_init<-cutreevar(tree,5)$cluster part2<-kmeansvar(X.quanti=decathlon[,1:10],init=part_init,matsim=TRUE) summary(part2) part2$sim
data(decathlon) #choice of the number of clusters tree <- hclustvar(X.quanti=decathlon[,1:10]) stab <- stability(tree,B=60) #a random set of variables is chosen as the initial cluster centers, nstart=10 times part1 <- kmeansvar(X.quanti=decathlon[,1:10],init=5,nstart=10) summary(part1) #the partition from the hierarchical clustering is chosen as initial partition part_init<-cutreevar(tree,5)$cluster part2<-kmeansvar(X.quanti=decathlon[,1:10],init=part_init,matsim=TRUE) summary(part2) part2$sim
Returns the similarity between two quantitative variables, two qualitative variables or a quantitative variable and a qualitative variable. The similarity between two variables is defined as a square cosine: the square of the Pearson correlation when the two variables are quantitative; the correlation ratio when one variable is quantitative and the other one is qualitative; the square of the canonical correlation between two sets of dummy variables, when the two variables are qualitative.
mixedVarSim(X1, X2)
mixedVarSim(X1, X2)
X1 |
a vector or a factor |
X2 |
a vector or a factor |
Plot of the index of stability of the partitions against the number of clusters.
## S3 method for class 'clustab' plot(x, nmin = NULL, nmax = NULL, ...)
## S3 method for class 'clustab' plot(x, nmin = NULL, nmax = NULL, ...)
x |
an object of class |
nmin |
the minimum number of clusters in the plot. |
nmax |
the maximum number of clusters in the plot. |
... |
further arguments passed to or from other methods. |
data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10]) stab<-stability(tree,B=20) plot(stab,nmax=7)
data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10]) stab<-stability(tree,B=20) plot(stab,nmax=7)
Plot dotchart with the "loadings" of the variables in each cluster. The loading of a numerical variable is the correlation between this variables and the synthetic variable of its cluster. The loading of the level of a categorical variable is the mean value of the synthetic variable of the cluster on observations having this level.
## S3 method for class 'clustvar' plot(x, ...)
## S3 method for class 'clustvar' plot(x, ...)
x |
an object of class |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
coord.quanti |
coordinates of quantitative variables belonging to cluster k on the synthetic variable associate to the same cluster k |
coord.levels |
coordinates of levels of categorical variables belonging to cluster k on the synthetic variable associate to the same cluster k |
data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) tree.cut<-cutreevar(tree,6) #plot of scores on synthetic variables res.plot <- plot(tree.cut) res.plot$coord.quanti res.plot$coord.levels
data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) tree.cut<-cutreevar(tree,6) #plot of scores on synthetic variables res.plot <- plot(tree.cut) res.plot$coord.quanti res.plot$coord.levels
Dendrogram of the hierarchy of variables resulting from hclustvar
and aggregation levels plot.
## S3 method for class 'hclustvar' plot(x, type = "tree", sub = "", ...)
## S3 method for class 'hclustvar' plot(x, type = "tree", sub = "", ...)
x |
an object of class |
type |
if type="tree" plot of the dendrogram and if type="index" aggregation levels plot. |
sub |
a sub title for the plot. |
... |
further arguments passed to or from other methods. |
data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) plot(tree) #Aggregation levels plot plot(tree,type="index")
data(wine) X.quanti <- PCAmixdata::splitmix(wine)$X.quanti X.quali <- PCAmixdata::splitmix(wine)$X.quali tree <- hclustvar(X.quanti,X.quali) plot(tree) #Aggregation levels plot plot(tree,type="index")
A partition of variables obtained with kmeansvar or with cutreevar is given in input. Each cluster of this partition is associated with a synthetic variable which is a linear combination of the variables of the cluster. The coefficients of these k linear combinations (one for each cluster) are used here to calculate new scores of a objects described in a new dataset (with the same variables). The output is the matrix of the scores of these new objects on the k synthetic variables.
## S3 method for class 'clustvar' predict(object, X.quanti = NULL, X.quali = NULL, ...)
## S3 method for class 'clustvar' predict(object, X.quanti = NULL, X.quali = NULL, ...)
object |
an object of class clustvar |
X.quanti |
numeric matrix of data for the new objects |
X.quali |
a categorical matrix of data for the new objects |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Returns the matrix of the scores of the new objects on the k syntetic variables of the k-clusters partition given in input.
data(wine) n <- nrow(wine) sub <- 10:20 data.sub <- wine[sub,] #learning sample X.quanti <- wine[sub,c(3:29)] #learning sample X.quali <- wine[sub,c(1,2)] part <-kmeansvar(X.quanti, X.quali, init=5) X.quanti.t <- wine[-sub,c(3:29)] X.quali.t <- wine[-sub,c(1,2)] new <- predict(part,X.quanti.t,X.quali.t)
data(wine) n <- nrow(wine) sub <- 10:20 data.sub <- wine[sub,] #learning sample X.quanti <- wine[sub,c(3:29)] #learning sample X.quali <- wine[sub,c(1,2)] part <-kmeansvar(X.quanti, X.quali, init=5) X.quanti.t <- wine[-sub,c(3:29)] X.quali.t <- wine[-sub,c(1,2)] new <- predict(part,X.quanti.t,X.quali.t)
The data measure the amount of protein consumed for nine food groups in 25 European countries. The nine food groups are red meat (RedMeat), white meat (WhiteMeat), eggs (Eggs), milk (Milk), fish (Fish), cereal (Cereal), starch (Starch), nuts (Nuts), and fruits and vegetables (FruitVeg).
A data frame with 25 rows (the European countries) and 9 columns (the food groups)
Originated by A. Weber and cited in Hand et al., A Handbook of Small Data Sets, (1994, p. 297).
Returns the Rand index, the corrected Rand index or the asymmetrical Rand index. The asymmetrical Rand index (corrected or not) measures the inclusion of a partition P into and partition Q with the number of clusters in P greater than the number of clusters in Q.
rand(P, Q, symmetric = TRUE, adj = TRUE)
rand(P, Q, symmetric = TRUE, adj = TRUE)
P |
a factor, e.g., the first partition. |
Q |
a factor, e.g., the second partition. |
symmetric |
a boolean. If FALSE the asymmetrical Rand index is calculated. |
adj |
a boolean. If TRUE the corrected index is calculated. |
This function selects in each cluster a given number of variables having the highest squared loadings. The squared loading of a variable in a cluster is its squared correlation (for numerical variable) and its correlation ratio (for categorical variable) with the first PC of PCAmix applied to the variables of the cluster.
selvar(part, nsel)
selvar(part, nsel)
part |
an object of class |
nsel |
the number of variables selected in each cluster. |
If the number of variables in a cluster is smaller than nsel
,
all the variables of the cluster are selected
Returns a list where each element contains the nsel
selected variables.
data(decathlon) tree <- hclustvar(decathlon[,1:10]) part <- cutreevar(tree,4) part$var selvar(part,2)
data(decathlon) tree <- hclustvar(decathlon[,1:10]) part <- cutreevar(tree,4) part$var selvar(part,2)
Evaluates the stability of partitions obtained from a hierarchy of p
variables. This hierarchy is performed with hclustvar
and the
stability of the partitions of 2 to p-1 clusters is evaluated with a
bootstrap approach. The boostrap approch is the following: hclustvar
is applied to B
boostrap samples of the n
rows. The
partitions of 2 to p-1 clusters obtained from the B bootstrap hierarchies
are compared with the partitions from the initial hierarchy . The mean of
the corrected Rand indices is plotted according to the number of clusters.
This graphical representation helps in the determination of a suitable
numbers of clusters.
stability(tree, B = 100, graph = TRUE)
stability(tree, B = 100, graph = TRUE)
tree |
an object of class |
B |
the number of bootstrap samples. |
graph |
boolean, if 'TRUE' a graph is displayed. |
matCR |
matrix of corrected Rand indices. |
meanCR |
vector of mean corrected Rand indices. |
data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10]) stab<-stability(tree,B=20) plot(stab,nmax=7) boxplot(stab$matCR[,1:7])
data(decathlon) tree <- hclustvar(X.quanti=decathlon[,1:10]) stab<-stability(tree,B=20) plot(stab,nmax=7) boxplot(stab$matCR[,1:7])
This is a method for the function summary for objects of the class
clustab
.
## S3 method for class 'clustab' summary(object, ...)
## S3 method for class 'clustab' summary(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. |
This is a method for the function summary for objects of the class
clustvar
.
## S3 method for class 'clustvar' summary(object, ...)
## S3 method for class 'clustvar' summary(object, ...)
object |
an object of class |
... |
further arguments passed to or from other methods. |
Returns a list of matrices of squared loadings i.e. for each
cluster of variables, the squared loadings on first principal component of
PCAmix. For quantitative variables (resp. qualitative), squared loadings
are the squared correlations (resp. the correlation ratios) with the first
PC (the cluster center). If the partition of variables has been obtained
with kmeansvar
the number of iteration until convergence is also
indicated.
data(decathlon) part<-kmeansvar(X.quanti=decathlon[,1:10],init=5) summary(part)
data(decathlon) part<-kmeansvar(X.quanti=decathlon[,1:10],init=5) summary(part)
A user satisfaction survey of pleasure craft operators on the “Canal des Deux Mers”, located in South of France, was carried out by the public corporation “Voies Navigables de France” (VNF) responsible for managing and developing the largest network of navigable waterways in Europe
data(vnf)
data(vnf)
A data frame with 1232 observations and 14 qualitative variables.
Josse, J., Chavent, M., Liquet, B. and Husson, F. (2012). Handling missing values with Regularized Iterative Multiple Correspondence Analysis. Journal of classification, Vol. 29, pp. 91-116.
The data used here refer to 21 wines of Val de Loire.
data(wine)
data(wine)
A data frame with 21 rows (the number of wines) and 31 columns: the first column corresponds to the label of origin, the second column corresponds to the soil, and the others correspond to sensory descriptors.
Centre de recherche INRA d'Angers
Le, S., Josse, J. & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software. 25(1). pp. 1-18.