代码之家  ›  专栏  ›  技术社区  ›  Hannah H.

NbClust中出错:没有足够的对象进行聚类

  •  0
  • Hannah H.  · 技术社区  · 7 年前

    Manning

    hclust(md,method=“average”)中的错误:必须有n>=2个对象 簇

    这是我的代码:

    mydata = read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE)
    
    mydata <- mydata[-1] # Without first line (int)
    data.transformed <- t(mydata) # Transformation of matrix
    data.scale <- scale(data.transformed) # Scaling of table
    data.dist <- dist(data.scale) # Calculates distances between points
    
    fit.average <- hclust(data.dist, method = "average")
    plot(fit.average, hang = -1, cex = .8, main = "Average Linkage Clustering")
    
    library(NbClust)
    nc <- NbClust(data.scale, distance="euclidean", 
              min.nc=2, max.nc=15, method="average") 
    

    我发现了类似的问题 here

    1 回复  |  直到 4 年前
        1
  •  2
  •   Marco Sandri    7 年前

    您的数据集中存在一些问题。

    mydata <- read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE)
    mydata <- mydata[1:(nrow(mydata)-4),]
    mydata[,1] <- as.numeric(mydata[,1])
    

    data.transformed <- t(mydata) # Transformation of matrix
    data.scale <- scale(data.transformed) # Scaling of table
    

    data.scale 不是全秩矩阵:

    dim(data.scale)
    # [1]  72 447
    qr(data.scale)$rank
    # [1] 71
    

    因此,我们从中删除一行 并将其转置:

    data.scale <- t(data.scale[-72,])
    

    现在数据集已准备就绪 NbClust

    library(NbClust)
    nc <- NbClust(data=data.scale, distance="euclidean", 
              min.nc=2, max.nc=15, method="average") 
    

    输出为

    [1] "Frey index : No clustering structure in this data set"
    *** : The Hubert index is a graphical method of determining the number of clusters.
                    In the plot of Hubert index, we seek a significant knee that corresponds to a 
                    significant increase of the value of the measure i.e the significant peak in Hubert
                    index second differences plot. 
    
    *** : The D index is a graphical method of determining the number of clusters. 
                    In the plot of D index, we seek a significant knee (the significant peak in Dindex
                    second differences plot) that corresponds to a significant increase of the value of
                    the measure. 
    
    ******************************************************************* 
    * Among all indices:                                                
    * 8 proposed 2 as the best number of clusters 
    * 4 proposed 3 as the best number of clusters 
    * 8 proposed 4 as the best number of clusters 
    * 1 proposed 5 as the best number of clusters 
    * 1 proposed 8 as the best number of clusters 
    * 1 proposed 11 as the best number of clusters 
    
                       ***** Conclusion *****                            
    
    * According to the majority rule, the best number of clusters is  2 
    
    ******************************************************************* 
    

    enter image description here