代码之家  ›  专栏  ›  技术社区  ›  Make42

石灰局部预测的重构

  •  0
  • Make42  · 技术社区  · 6 年前

    我在用 lime 莱姆是一种从复杂模型建立局部线性模型的方法。R包返回给我线性模型的系数和线性模型的预测。我试图用系数重建预测,以便更好地了解石灰的工作原理,但重建并没有得到相同的结果。

    我的代码是

    library(MASS)
    library(lime)
    library(caret)
    library(dplyr)
    data(biopsy)
    
    # First we'll clean up the data a bit
    biopsy$ID <- NULL
    biopsy <- na.omit(biopsy)
    names(biopsy) <- c('clump thickness', 'uniformity of cell size', 
                       'uniformity of cell shape', 'marginal adhesion',
                       'single epithelial cell size', 'bare nuclei', 
                       'bland chromatin', 'normal nucleoli', 'mitoses',
                       'class')
    
    set.seed(4)
    test_set <- sample(seq_len(nrow(biopsy)), 4)
    data_train = biopsy[-test_set,] %>% dplyr::select(-class)
    class_train = biopsy[-test_set,] %>% .[["class"]] %>% factor
    data_test = biopsy[test_set,] %>% dplyr::select(-class)
    class_test = biopsy[test_set,] %>% .[["class"]] %>% factor
    model = train(data_train, class_train, method="rf") # Random Forest
    
    explainer <- lime(data_train, model, bin_continuous = TRUE, quantile_bins = FALSE)
    explanation <- explain(data_test[1,], explainer, n_labels = 1, n_features = 4)
    

    对于测试数据 data_test[1,] 我想看看局部线性模型(由岭回归训练)。我储存的这个 model_expl . 特别有趣的是拦截。模型中有关特征(如系数)的信息存储在 feature_expl .

    model_expl = explanation %>%
      dplyr::select(-starts_with("feature")) %>%
      filter(case == .$case[1]) %>%
      unique %>%
      mutate_if(is.numeric, as.character) %>%
      mutate_all(as.character) %>%
      gather(key, value)
    
    feature_expl = explanation %>%
      dplyr::select(case, starts_with("feature")) %>%
      filter(case == .$case[1])
    

    打印出来的结果

    1 model_type      classification                                                                              
    2 case            416                                                                                         
    3 label           benign                                                                                      
    4 label_prob      0.552                                                                                       
    5 model_r2        0.475778176360649                                                                           
    6 model_intercept 0.104316310033944                                                                           
    7 model_predicti… 0.715122989626457                                                                           
    8 data            list(`clump thickness` = 3, `uniformity of cell size` = 3, `uniformity of cell shape` = 2, …
    9 prediction      list(benign = 0.552, malignant = 0.448)
    
    
      case                  feature feature_value feature_weight                     feature_desc
    1  416                  mitoses             1      0.0253919                  mitoses <= 3.25
    2  416              bare nuclei             3      0.2476868              bare nuclei <= 3.25
    3  416  uniformity of cell size             3      0.1792691  uniformity of cell size <= 3.25
    4  416 uniformity of cell shape             2      0.1584589 uniformity of cell shape <= 3.25
    

    我得到一个 model_prediction 属于 0.715906270331288 从解释中。 截获 0.114219195393416 我尝试重建局部近似:

    sum(feature_expl$feature_value * feature_expl$feature_weight) + 0.114219195393416
    

    但是得到 2.155599 而不是 零点七一五九零六二七零三三一二八八 . 我读到我需要缩放,但我找不到如何正确缩放。我需要做什么来重建局部预测?

    0 回复  |  直到 6 年前