代码之家 › 专栏 › 技术社区 › user23148

线性回归(特征选择)的插入符号向前跳跃更改nvmax

feature-selection r-caret parameters r

user23148 · 技术社区 · 7 年前

我一直在使用包leaps中的leapForward方法和caret,发现它只提供了5个变量。根据leaps软件包,您可以将nvmax更改为任意数量的子集。

我似乎不知道该把这个放在插入符号的包装里。我试着把它放在train语句中,并创建一个扩展。网格线和ti似乎不起作用。任何帮助都将不胜感激!

我的代码:

library(caret)        
data <- read.csv(file="C:/mydata.csv", header=TRUE, sep=",")
fitControl <- trainControl(method = "loocv")
x <- data[, -19]
y <- data[, 19]
lmFit <- train(x=x, y=y,'leapForward', trControl = fitControl)
summary(lmFit)

1 回复 | 直到 7 年前

Gilles San Martin 7 年前

插入符号的默认行为是对调整参数进行随机搜索。可以根据需要指定参数网格,使用 tuneGrid 选项

下面是一个可复制的血脑数据集示例。NB:我不得不用主成分分析(PCA)变换预测值以避免多重线性问题

library(caret)
data(BloodBrain, package = "caret")
dim(bbbDescr)
#> [1] 208 134
X <- princomp(bbbDescr)$scores[,1:131]
Y <- logBBB
fitControl <- trainControl(method = "cv")

默认值:随机搜索参数

lmFit <- train(y = Y, x = X,'leapForward', trControl = fitControl)
lmFit
#> Linear Regression with Forward Selection 
#> 
#> 208 samples
#> 131 predictors
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 187, 188, 187, 187, 187, 187, ... 
#> Resampling results across tuning parameters:
#> 
#>   nvmax  RMSE       Rsquared   MAE      
#>   2      0.6682545  0.2928583  0.5286758
#>   3      0.7008359  0.2652202  0.5527730
#>   4      0.6781190  0.3026475  0.5215527
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final value used for the model was nvmax = 2.

使用您选择的网格搜索。
注意: expand.grid 此处不需要。它在您组合时很有用几个调谐参数

lmFit <- train(y = Y, x = X,'leapForward', trControl = fitControl, 
               tuneGrid = expand.grid(nvmax = seq(1, 30, 2)))
lmFit
#> Linear Regression with Forward Selection 
#> 
#> 208 samples
#> 131 predictors
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 188, 188, 188, 186, 187, 187, ... 
#> Resampling results across tuning parameters:
#> 
#>   nvmax  RMSE       Rsquared    MAE      
#>    1     0.7649633  0.07840817  0.5919515
#>    3     0.6952295  0.27147443  0.5250173
#>    5     0.6482456  0.35953363  0.4828406
#>    7     0.6509919  0.37800159  0.4865292
#>    9     0.6721529  0.35899937  0.5104467
#>   11     0.6541945  0.39316037  0.4979497
#>   13     0.6355383  0.42654189  0.4794705
#>   15     0.6493433  0.41823974  0.4911399
#>   17     0.6645519  0.37338055  0.5105887
#>   19     0.6575950  0.39628133  0.5084652
#>   21     0.6663806  0.39156852  0.5124487
#>   23     0.6744933  0.38746853  0.5143484
#>   25     0.6709936  0.39228681  0.5025907
#>   27     0.6919163  0.36565876  0.5209107
#>   29     0.7015347  0.35397968  0.5272448
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final value used for the model was nvmax = 13.
plot(lmFit)

创建日期:2018年3月8日 reprex package (v0.2.0)。