代码之家 › 专栏 › 技术社区 › sedavidw

查找sklearn决策树分类器的随机状态

decision-tree scikit-learn python

sedavidw · 技术社区 · 11 年前

我有一些数据,我正在拟合 sklearn DecisionTreeClassifier 由于分类器使用了一些随机性,我运行了几次并保存了最佳模型。然而,我希望能够重新训练数据,并在不同的机器上获得相同的结果。

有没有办法找出最初的 random_state 在我为每个分类器训练模型之后?

编辑这个 sklearn 模型有一个名为 get_params() 这显示了输入是什么。但对于 随机_状态 上面仍然写着 None 。然而,根据文档,当它使用这种情况时 numpy 以产生随机数。我想弄清楚那个随机数是什么

2 回复 | 直到 11 年前

Fred Foo 11 年前

您必须向d-tree构造函数传递显式随机状态:

>>> DecisionTreeClassifier(random_state=42).get_params()['random_state']
42

将其保留为默认值 None 意味着 fit 方法将使用 numpy.random 的单例随机状态,这是不可预测的,并且在运行期间也不相同。

ali_m 10 年前

我建议您最好使用随机森林来实现这一目的——随机森林包含许多基于预测子集的树。然后,只需使用 RandomForestVariableName.estimators_

我将在这里使用我的代码作为示例:

with open('C:\Users\Saskia Hill\Desktop\Exported\FinalSpreadsheet.csv', 'rb') as csvfile:
    titanic_reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    row = titanic_reader.next()
    feature_names = np.array(row)

    # Load dataset, and target classes
    titanic_X, titanic_y = [], []
    for row in titanic_reader:  
    titanic_X.append(row)
    titanic_y.append(row[11]) # The target values are your class labels

    titanic_X = np.array(titanic_X)
    titanic_y = np.array(titanic_y)
    print titanic_X, titanic_y

print feature_names, titanic_X[0], titanic_y[0]
titanic_X = titanic_X[:, [2,3,4,5,6,7,8,9,10]] #these are your predictors/ features
feature_names = feature_names[[2,3,4,5,6,7,8,9,10]]

from sklearn import tree

rfclf = RandomForestClassifier(criterion='entropy', min_samples_leaf=1,  max_features='auto', max_leaf_nodes=None, verbose=0)

rfclf = rfclf.fit(titanic_X,titanic_y)

rfclf.estimators_     #the output for this is pasted below:

[DecisionTreeClassifier(compute_importances=None, criterion='entropy',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_density=None, min_samples_leaf=1, min_samples_split=2,
        random_state=1490702865, splitter='best'),
DecisionTreeClassifier(compute_importances=None, criterion='entropy',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_density=None, min_samples_leaf=1, min_samples_split=2,
        random_state=174216030, splitter='best') ......

因此,Random Forests将随机性引入决策树文件中,不需要对决策树使用的初始数据进行调整,但它们作为交叉验证的方法,为您提供了对数据准确性的更多信心(特别是如果像我一样,您有一个小数据集)。

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

4 月前

prayner · 更新嵌套字典包含列表中的项

4 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前