代码之家  ›  专栏  ›  技术社区  ›  Anna Carolina de Roldão

R中的错误消息:错误`[.data.frame`(m,labs):选择了未定义的列

  •  0
  • Anna Carolina de Roldão  · 技术社区  · 2 年前

    我试图使用Train函数在数据集中运行回归树。该数据集包含数值变量,我将其转换为categorial,试图解决错误消息。我也在使用列车控制功能,再次尝试解决这个错误。帮助

    library(caret)
    library(rpart)
    library(mlbench)
    data(Dataset)
    set.seed(1)
    ctrl \<- trainControl(method = "cv", savePredictions = TRUE)
    model_T \<- train(VALUE\~REF_DATE+Sex+`Age at admission`+`Years since admission`+`Income type`+Statistics+UOM, data = Dataset, method = 'rpart2', trControl = ctrl)
    model_T
    

    数据集的结构:

    spec_tbl_df \[46,464 x 8\] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
    $ REF_DATE             : Factor w/ 11 levels "2006","2007",..: 1 2 3 4 5 6 7 8 9 10 ...
    $ Sex                  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
    $ Age at admission     : Factor w/ 4 levels "1","2","3","4": 4 4 4 4 4 4 4 4 4 4 ...
    $ Years since admission: Factor w/ 11 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
    $ Income type          : Factor w/ 6 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
    $ Statistics           : Factor w/ 4 levels "1","2","3","4": 3 3 3 3 3 3 3 3 3 3 ...
    $ UOM                  : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
    $ VALUE                : num \[1:46464\] 154640 145895 151290 155340 169745 ...
    
    1 回复  |  直到 2 年前
        1
  •  0
  •   akrun    2 年前

    这个问题与列名中的空格有关

    library(caret)
    library(rpart)
    library(mlbench)
    ctrl <- trainControl(method = "cv",
                         savePredictions =TRUE)
    model_T <- train(VALUE~REF_DATE+Sex+`Age at admission`+`Years since admission`+`Income type`+Statistics+UOM, 
                     data = Dataset, method = 'rpart2', trControl = ctrl)
    #Error in `[.data.frame`(m, labs) : undefined columns selected 
    

    如果我们使用一个名称干净的数据集,即用下划线等替换空格,它应该可以工作——这里我们使用了 clean_names 从…起 janitor 这么做

    library(janitor)
    Dataset2 <- clean_names(Dataset)
    names(Dataset2)
    #[1] "value"                 "ref_date"              "sex"                   "age_at_admission"      "years_since_admission" "income_type"           "statistics"            "uom"    
    

    现在创建模型

    model_T2 <- train(value~ref_date+sex+ age_at_admission+years_since_admission+income_type+statistics+uom, 
                      data = Dataset2, method = 'rpart2', trControl = ctrl)
    

    -输出

    > model_T2
    CART 
    
    200 samples
      7 predictor
    
    No pre-processing
    Resampling: Cross-Validated (10 fold) 
    Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
    Resampling results across tuning parameters:
    
      maxdepth  RMSE       Rsquared    MAE      
      1         0.9669617  0.03721968  0.7642369
      2         0.9674085  0.02626375  0.7656366
      6         1.0268165  0.03139845  0.8033324
    
    RMSE was used to select the optimal model using the smallest value.
    The final value used for the model was maxdepth = 1.
    

    数据

    set.seed(123)
    Dataset <- tibble(VALUE = rnorm(200), REF_DATE = factor(rep(c(2006, 2007), each = 100)), Sex = factor(sample(1:4, size = 200, replace = TRUE)),
                      `Age at admission` = factor(sample(1:4, size = 200, replace = TRUE)),
                      `Years since admission` = factor(sample(1:11, size = 200, replace = TRUE)), 
                      `Income type` = factor(sample(1:6, size = 200, replace = TRUE)),
                      Statistics = factor(sample(1:4, size = 200, replace = TRUE)),
                      UOM = factor(sample(1:2, size = 200, replace = TRUE))
                      )