我最近开始玩决策树,我想用一些人工数据训练我自己的简单模型。我想用这个模型来预测更多的模拟数据,只是想了解它是如何工作的,但后来我陷入了困境。一旦您的模型经过训练,您如何将数据传递给predict()?
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
文档状态:
clf.预测(x)
参数:
x:形状的数组或稀疏矩阵=[n_samples,n_features]
但当试图传递np.array、np.ndarray、list、tuple或dataframe时,只会抛出一个错误。你能帮我理解为什么吗?
代码如下:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import graphviz
import pandas as pd
import numpy as np
import random
from sklearn import tree
pd.options.display.max_seq_items=5000
pd.options.display.max_rows=20
pd.options.display.max_columns=150
lenght = 50000
miles_commuting = [random.choice([2,3,4,5,7,10,20,25,30]) for x in range(lenght)]
salary = [random.choice([1300,1600,1800,1900,2300,2500,2700,3300,4000]) for x in range(lenght)]
full_time = [random.choice([1,0,1,1,0,1]) for x in range(lenght)]
DataFrame = pd.DataFrame({'CommuteInMiles':miles_commuting,'Salary':salary,'FullTimeEmployee':full_time})
DataFrame['Moving'] = np.where((DataFrame.CommuteInMiles > 20) & (DataFrame.Salary > 2000) & (DataFrame.FullTimeEmployee == 1),1,0)
DataFrame['TargetLabel'] = np.where((DataFrame.Moving == 1),'Considering move','Not moving')
target = DataFrame.loc[:,'Moving']
data = DataFrame.loc[:,['CommuteInMiles','Salary','FullTimeEmployee']]
target_names = DataFrame.TargetLabel
features = data.columns.values
clf = tree.DecisionTreeClassifier()
clf = clf.fit(data, target)
clf.predict(?????) #### <===== What should go here?
clf.predict([30,4000,1])
ValueError:应为二维数组,但得到的却是一维数组:
数组=[3.E+01 4.E+03 1.E+00]。
使用array重新调整数据的形状。如果数据有单个功能或数组,则重新调整形状(-1,1)。如果数据包含单个示例,则重新调整形状(1,-1)。
clf.predict(np.array(30,4000,1))
值错误:只接受2个非关键字参数