代码之家  ›  专栏  ›  技术社区  ›  shakedzy

sklearn:从随机森林中获得预测分数?

  •  2
  • shakedzy  · 技术社区  · 6 年前

    我正在使用 sklearn 的随机森林分类器,用于我构建的模型。当使用它进行预测时,我想知道是否有办法获得预测的确定性水平(即预测该类的树的数量)?

    2 回复  |  直到 6 年前
        1
  •  4
  •   shakedzy    6 年前

    显然,在 RandomForestClassifier :

    forest.predict_proba(X)
    
        2
  •  2
  •   kazAnova    6 年前

    没有直接的方法可以做到这一点。你必须把每一棵树从森林中取出,做出(单棵树)预测,然后计算有多少树给出了与森林相同的答案。

    这是一个例子:

    import numpy as np
    from sklearn.ensemble import RandomForestClassifier 
    #modelling data
    X=np.array([[1,2,3,4],[1,3,1,2],[4,6,1,2], [3,3,4,3] , [1,1,2,1]  ])
    #target variable
    y=np.array([1,0,1,1,0])
    #random_forest model
    forest = RandomForestClassifier(n_estimators=10, random_state=1)
    #fit forest model
    forest = forest.fit( X, y )
    #predict .
    full_predictions=forest.predict( X )
    print (full_predictions)
    #[1 0 1 1 0]
    
    #initialize a vector to hold counts of trees that gave the same class as in full_predictions. Has the same length as rows in the data
    counts_of_same_predictions=[0 for i in range (len(y)) ]
    #access each one of the trees and make a prediction and then count whether it was the same as the one with the Random Forest
    i_tree = 0
    for tree_in_forest in forest.estimators_:
        single_tree_predictions=tree_in_forest.predict(X)
        #check if predictions are the same with the global (forest's) predictions
        for j in range (len(single_tree_predictions)):
            if single_tree_predictions[j]==full_predictions[j]:
                #increment counts for that row
                counts_of_same_predictions[j]+=1
    
    print('counts of same classifications', counts_of_same_predictions)
    #counts of same classifications [6, 7, 8, 8, 8]