这个问题与列名中的空格有关
library(caret)
library(rpart)
library(mlbench)
ctrl <- trainControl(method = "cv",
savePredictions =TRUE)
model_T <- train(VALUE~REF_DATE+Sex+`Age at admission`+`Years since admission`+`Income type`+Statistics+UOM,
data = Dataset, method = 'rpart2', trControl = ctrl)
#Error in `[.data.frame`(m, labs) : undefined columns selected
如果我们使用一个名称干净的数据集,即用下划线等替换空格,它应该可以工作——这里我们使用了
clean_names
从…起
janitor
这么做
library(janitor)
Dataset2 <- clean_names(Dataset)
names(Dataset2)
#[1] "value" "ref_date" "sex" "age_at_admission" "years_since_admission" "income_type" "statistics" "uom"
现在创建模型
model_T2 <- train(value~ref_date+sex+ age_at_admission+years_since_admission+income_type+statistics+uom,
data = Dataset2, method = 'rpart2', trControl = ctrl)
-输出
> model_T2
CART
200 samples
7 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
maxdepth RMSE Rsquared MAE
1 0.9669617 0.03721968 0.7642369
2 0.9674085 0.02626375 0.7656366
6 1.0268165 0.03139845 0.8033324
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was maxdepth = 1.
数据
set.seed(123)
Dataset <- tibble(VALUE = rnorm(200), REF_DATE = factor(rep(c(2006, 2007), each = 100)), Sex = factor(sample(1:4, size = 200, replace = TRUE)),
`Age at admission` = factor(sample(1:4, size = 200, replace = TRUE)),
`Years since admission` = factor(sample(1:11, size = 200, replace = TRUE)),
`Income type` = factor(sample(1:6, size = 200, replace = TRUE)),
Statistics = factor(sample(1:4, size = 200, replace = TRUE)),
UOM = factor(sample(1:2, size = 200, replace = TRUE))
)