代码之家  ›  专栏  ›  技术社区  ›  Bartek Malysz

传递给clf.predict()的是什么?

  •  0
  • Bartek Malysz  · 技术社区  · 6 年前

    我最近开始玩决策树,我想用一些人工数据训练我自己的简单模型。我想用这个模型来预测更多的模拟数据,只是想了解它是如何工作的,但后来我陷入了困境。一旦您的模型经过训练,您如何将数据传递给predict()?

    http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

    文档状态: clf.预测(x)

    参数: x:形状的数组或稀疏矩阵=[n_samples,n_features]

    但当试图传递np.array、np.ndarray、list、tuple或dataframe时,只会抛出一个错误。你能帮我理解为什么吗?

    代码如下:

    from IPython.core.display import display, HTML
    display(HTML("<style>.container { width:100% !important; }</style>"))
    
    import graphviz
    import pandas as pd
    import numpy as np
    import random
    from sklearn import tree
    
    pd.options.display.max_seq_items=5000
    pd.options.display.max_rows=20
    pd.options.display.max_columns=150
    
    lenght = 50000
    
    miles_commuting = [random.choice([2,3,4,5,7,10,20,25,30]) for x in range(lenght)]
    salary = [random.choice([1300,1600,1800,1900,2300,2500,2700,3300,4000]) for x in range(lenght)]
    full_time = [random.choice([1,0,1,1,0,1]) for x in range(lenght)]
    
    DataFrame = pd.DataFrame({'CommuteInMiles':miles_commuting,'Salary':salary,'FullTimeEmployee':full_time})
    
    DataFrame['Moving'] = np.where((DataFrame.CommuteInMiles > 20) & (DataFrame.Salary > 2000) & (DataFrame.FullTimeEmployee == 1),1,0)
    DataFrame['TargetLabel'] = np.where((DataFrame.Moving == 1),'Considering move','Not moving')
    
    target = DataFrame.loc[:,'Moving']
    data = DataFrame.loc[:,['CommuteInMiles','Salary','FullTimeEmployee']]
    target_names = DataFrame.TargetLabel
    features = data.columns.values
    
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(data, target)
    
    clf.predict(?????) #### <===== What should go here?
    
    clf.predict([30,4000,1])
    

    ValueError:应为二维数组,但得到的却是一维数组: 数组=[3.E+01 4.E+03 1.E+00]。 使用array重新调整数据的形状。如果数据有单个功能或数组,则重新调整形状(-1,1)。如果数据包含单个示例,则重新调整形状(1,-1)。

    clf.predict(np.array(30,4000,1))
    

    值错误:只接受2个非关键字参数

    1 回复  |  直到 6 年前
        1
  •  1
  •   Vivek Kumar    6 年前

    fit() ['CommuteInMiles','Salary','FullTimeEmployee']

    clf.predict([30,4000,1])
    

    clf.predict([[30,4000,1]])     #<== Observe the two square brackets
    

    X_test = [[30,4000,1],
              [35,15000,0],
              [40,2000,1],]
    clf.predict(X_test)
    

    clf.predict(np.array(30,4000,1)) predict() np.array()

    the documentation np.array

    (object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
    

    object np.array(30,4000,1) object=30 dtype=4000 copy=1

    np.array([30,4000,1])