代码之家 › 专栏 › 技术社区 › Make42

石灰局部预测的重构

machine-learning r

Make42 · 技术社区 · 6 年前

我在用 lime 莱姆是一种从复杂模型建立局部线性模型的方法。R包返回给我线性模型的系数和线性模型的预测。我试图用系数重建预测,以便更好地了解石灰的工作原理,但重建并没有得到相同的结果。

我的代码是

library(MASS)
library(lime)
library(caret)
library(dplyr)
data(biopsy)

# First we'll clean up the data a bit
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
names(biopsy) <- c('clump thickness', 'uniformity of cell size', 
                   'uniformity of cell shape', 'marginal adhesion',
                   'single epithelial cell size', 'bare nuclei', 
                   'bland chromatin', 'normal nucleoli', 'mitoses',
                   'class')

set.seed(4)
test_set <- sample(seq_len(nrow(biopsy)), 4)
data_train = biopsy[-test_set,] %>% dplyr::select(-class)
class_train = biopsy[-test_set,] %>% .[["class"]] %>% factor
data_test = biopsy[test_set,] %>% dplyr::select(-class)
class_test = biopsy[test_set,] %>% .[["class"]] %>% factor
model = train(data_train, class_train, method="rf") # Random Forest

explainer <- lime(data_train, model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(data_test[1,], explainer, n_labels = 1, n_features = 4)

对于测试数据 data_test[1,] 我想看看局部线性模型(由岭回归训练)。我储存的这个 model_expl . 特别有趣的是拦截。模型中有关特征(如系数)的信息存储在 feature_expl .

model_expl = explanation %>%
  dplyr::select(-starts_with("feature")) %>%
  filter(case == .$case[1]) %>%
  unique %>%
  mutate_if(is.numeric, as.character) %>%
  mutate_all(as.character) %>%
  gather(key, value)

feature_expl = explanation %>%
  dplyr::select(case, starts_with("feature")) %>%
  filter(case == .$case[1])

打印出来的结果

1 model_type      classification                                                                              
2 case            416                                                                                         
3 label           benign                                                                                      
4 label_prob      0.552                                                                                       
5 model_r2        0.475778176360649                                                                           
6 model_intercept 0.104316310033944                                                                           
7 model_predictiâ¦ 0.715122989626457                                                                           
8 data            list(`clump thickness` = 3, `uniformity of cell size` = 3, `uniformity of cell shape` = 2, â¦
9 prediction      list(benign = 0.552, malignant = 0.448)


  case                  feature feature_value feature_weight                     feature_desc
1  416                  mitoses             1      0.0253919                  mitoses <= 3.25
2  416              bare nuclei             3      0.2476868              bare nuclei <= 3.25
3  416  uniformity of cell size             3      0.1792691  uniformity of cell size <= 3.25
4  416 uniformity of cell shape             2      0.1584589 uniformity of cell shape <= 3.25

我得到一个 model_prediction 属于 0.715906270331288 从解释中。截获 0.114219195393416 我尝试重建局部近似:

sum(feature_expl$feature_value * feature_expl$feature_weight) + 0.114219195393416

但是得到 2.155599 而不是 零点七一五九零六二七零三三一二八八 . 我读到我需要缩放,但我找不到如何正确缩放。我需要做什么来重建局部预测?

0 回复 | 直到 6 年前