代码之家  ›  专栏  ›  技术社区  ›  user026

使用knn预测另一个数据帧(python 3.6)中的值[关闭]

  •  0
  • user026  · 技术社区  · 6 年前

    我创建了一个数据框架,其中包含来自测井的地质数据,然后我创建了一个新的列,根据每一行的不同属性为其添加一个名称。这意味着:每一行现在都有一个摇滚名字。

    我的问题是:我已经用我拥有的所有数据训练了我的第一个数据帧,现在我想预测一个新数据帧的标签(岩石名称),它与第一个数据帧具有相同的列(属性)。但我不知道怎么做。这是我至今的密码:

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier
    
    data = pd.read_excel('wellA.xlsx')            #size (20956,26)
    well1 = pd.concat([data['GR'], data['NPHI'], data['RHOB'], data['SW'], 
    data['VSH'], data['rock_name']], axis=1, keys = 
    ['GR','NPHI','RHOB','SW','VSH','rock_name'])
    well1 = well1.drop(well1.index[0:15167])
    well1.dropna(axis=0, inplace=True)
    
    knn = KNeighborsClassifier(n_neighbors = 9)
    d = {'Claystone': 1, 'Calcareous Claystone': 2, 'Sandy Claystone': 3, 
    'Limestone': 4, 'Muddy Limestone': 5, 'Muddy Sandstone': 6, 'Sandstone': 7}
    well1['Label'] = well1['rock_name'].map(d)         #size (5412,7)
    
    X = well1[well1.columns[:5]]         #size (5412, 5)
    y = well1.rock_name                  #size (5412,)
    X_train, X_test, y_train, y_test = train_test_split (X, y, random_state = 0)
    #sizes: X_train(4059,5), X_test(1353,5) , y_train(4059,), y_test(1353,)
    knn.fit(X_train, y_train)      
    knn.score(X_test, y_test) 
    
    data2 = pd.read_excel('wellB.xlsx')        #size (29070, 12)
    well2 = pd.concat([data2['GR'], data2['NPHI'], data2['RHOB'], data2['SW'], 
    data2['VSH']], axis=1, keys = ['GR','NPHI','RHOB','SW','VSH'])
    well2.dropna(axis=0, inplace=True)         #size (2124, 5)
    
    # values of the properties
    gammaray = well2['GR'].values                             
    neutron = well2['NPHI'].values
    density = well2['RHOB'].values
    swat = well2['SW'].values
    vshale = well2['VSH'].values
    
    rock_name_pred = knn.predict([[gammaray, neutron, density, swat, vshale]])
    

    然后我有以下错误:

    回溯(最近一次呼叫的最后一次):

    File "C:\Users\laguiar\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)
    
    File "C:\Users\laguiar\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
    
    File "C:/Users/laguiar/Desktop/Projeto Norne/exemploKNN.py", line 41, in 
    <module> rock_name_pred = knn.predict([[gammaray, neutron, density, swat, 
    vshale]])
    
    File "C:\Users\laguiar\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\sklearn\neighbors\classification.py", line 143, in predict
    X = check_array(X, accept_sparse='csr')
    
    File "C:\Users\laguiar\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\sklearn\utils\validation.py", line 451, in check_array
    % (array.ndim, estimator_name))
    
    ValueError: Found array with dim 3. Estimator expected <= 2.
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   Luc Blassel    6 年前

    错误表明,knn期望数组的尺寸小于或等于2。但是在脚本中,您的属性 gammaray numpy 数组。
    当你写作时 [[gammaray, neutron, density, swat, vshale]] ,在你 knn.predict 调用时,双括号会添加两个维度,这样最终得到一个三维数组。
    尝试调用 predict 方法如下:
    rock_name_pred = knn.predict([gammaray, neutron, density, swat, vshale])

    或者你可以打电话给 预测 方法直接在数据帧上,就像 fit 方法:
    rock_name_pred = knn.predict(well2)