我在用
lime
莱姆是一种从复杂模型建立局部线性模型的方法。R包返回给我线性模型的系数和线性模型的预测。我试图用系数重建预测,以便更好地了解石灰的工作原理,但重建并没有得到相同的结果。
我的代码是
library(MASS)
library(lime)
library(caret)
library(dplyr)
data(biopsy)
# First we'll clean up the data a bit
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
names(biopsy) <- c('clump thickness', 'uniformity of cell size',
'uniformity of cell shape', 'marginal adhesion',
'single epithelial cell size', 'bare nuclei',
'bland chromatin', 'normal nucleoli', 'mitoses',
'class')
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy)), 4)
data_train = biopsy[-test_set,] %>% dplyr::select(-class)
class_train = biopsy[-test_set,] %>% .[["class"]] %>% factor
data_test = biopsy[test_set,] %>% dplyr::select(-class)
class_test = biopsy[test_set,] %>% .[["class"]] %>% factor
model = train(data_train, class_train, method="rf") # Random Forest
explainer <- lime(data_train, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(data_test[1,], explainer, n_labels = 1, n_features = 4)
对于测试数据
data_test[1,]
我想看看局部线性模型(由岭回归训练)。我储存的这个
model_expl
. 特别有趣的是拦截。模型中有关特征(如系数)的信息存储在
feature_expl
.
model_expl = explanation %>%
dplyr::select(-starts_with("feature")) %>%
filter(case == .$case[1]) %>%
unique %>%
mutate_if(is.numeric, as.character) %>%
mutate_all(as.character) %>%
gather(key, value)
feature_expl = explanation %>%
dplyr::select(case, starts_with("feature")) %>%
filter(case == .$case[1])
打印出来的结果
1 model_type classification
2 case 416
3 label benign
4 label_prob 0.552
5 model_r2 0.475778176360649
6 model_intercept 0.104316310033944
7 model_predicti⦠0.715122989626457
8 data list(`clump thickness` = 3, `uniformity of cell size` = 3, `uniformity of cell shape` = 2, â¦
9 prediction list(benign = 0.552, malignant = 0.448)
case feature feature_value feature_weight feature_desc
1 416 mitoses 1 0.0253919 mitoses <= 3.25
2 416 bare nuclei 3 0.2476868 bare nuclei <= 3.25
3 416 uniformity of cell size 3 0.1792691 uniformity of cell size <= 3.25
4 416 uniformity of cell shape 2 0.1584589 uniformity of cell shape <= 3.25
我得到一个
model_prediction
属于
0.715906270331288
从解释中。
截获
0.114219195393416
我尝试重建局部近似:
sum(feature_expl$feature_value * feature_expl$feature_weight) + 0.114219195393416
但是得到
2.155599
而不是
零点七一五九零六二七零三三一二八八
. 我读到我需要缩放,但我找不到如何正确缩放。我需要做什么来重建局部预测?