代码之家 › 专栏 › 技术社区 › Raj

R-随机森林-在测试数据上应用混淆矩阵的错误

confusion-matrix r-caret random-forest r

Raj · 技术社区 · 7 年前

bulbasaur[1:5,]
   Appt_date count no_of_reps PerReCount
1 2016-01-01     2          1   2.000000
2 2016-01-04   174         58   3.000000
3 2016-01-05   206         59   3.491525
4 2016-01-06   203         61   3.327869
5 2016-01-07   236         64   3.687500

我写的代码是:

install.packages("caret")
library(caret)

leaf <- bulbasaur
ctrl = trainControl(method="repeatedcv", number=100, repeats=50, selectionFunction = "oneSE")
in_train = createDataPartition(leaf$PerReCount, p=.75, list=FALSE)

#random forest
trf = train(PerReCount ~ ., data=leaf, method="rf", metric="RMSE",trControl=ctrl, subset = in_train)


#boosting
tgbm = train(PerReCount ~ ., data=leaf, method="gbm", metric="RMSE",
             trControl=ctrl, subset = in_train, verbose=FALSE)

resampls = resamples(list(RF = trf, GBM = tgbm))
difValues = diff(resampls)
summary(difValues)



######Using it on test matrix
test = leaf[-in_train,]
test$pred.leaf.rf = predict(trf, test, "raw")
confusionMatrix(test$pred.leaf.rf, test$PerReCount)

Error in confusionMatrix.default(test$pred.leaf.rf, test$PerReCount) : 
  the data cannot have more levels than the reference

我尝试了一些改变,比如 leaf$PerReCount <- as.factors(leaf$PerReCount) type = "class" ,但准确率太低了,我不想把它从回归改为分类。我如何在不转换为因子的情况下解决它,或者以任何其他方式解决问题,或者在不使用混淆矩阵的情况下获得精度计数。谢谢

1 回复 | 直到 7 年前

Raj 7 年前

@Damiano提出的问题是正确的,回归模型不会给出混淆矩阵,因为它不是是或否。我解决的问题是使用RMSE:

piko.chu = predict(trf, test)
RMSE.forest <- sqrt(mean((piko.chu-test$PerReCount)^2))

推荐文章

Mikz · 随机林中列车和测试数据拆分查询

7 年前

OBarros · RandomForestClassifier用于多类分类Spark 2。x个

7 年前

jlab · 错误:使用光栅属性表(RAT)时,新数据中的预测值与训练数据中的预测值不匹配

7 年前

Randoms · R: 检查培训数据中的变量

7 年前

C. Zed · 为什么我的新数据会得到同样的预测?

7 年前

mlee_jordan · 尽管交叉验证结果非常成功,但与随机林过度拟合

7 年前

abu · 在macOS上并行执行randomforestSRC

7 年前

Mike · R插入符号中随机森林的混淆矩阵

7 年前

annadai · 在randomforest上计算训练集AUC的两种不同方法得到了不同的结果?

7 年前

shubham jain · 随机森林是AdaBoost的特例吗?

7 年前