代码之家  ›  专栏  ›  技术社区  ›  Ben P

从KNN返回最近邻居列表

  •  1
  • Ben P  · 技术社区  · 6 年前

    我试图使用KNN模型来显示与品牌X最接近的相关品牌。我已经阅读了我的数据并将其转换,因此格式如下:

              User1     User2     User3     User4     User5
    Brand1    1         0         0         0         1
    Brand2    0         0         0         1         1
    Brand3    0         0         1         1         1
    Brand4    1         1         1         0         1
    Brand5    0         0         0         1         1
    

    然后我定义了我的模型:

    from sklearn.neighbors import NearestNeighbors
    
    model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
    model_knn.fit(df_mini)
    

    然后,我使用以下代码列出与随机选择的品牌最接近的5个品牌:

    query_index = np.random.choice(df_mini.shape[0])
    distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
    
    for i in range(0, len(distances.flatten())):
        if i == 0:
            print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
        else:
            print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
    

    Recommendations for BRAND_X:
    
    1: BRAND_a, with distance of 1.0:
    2: BRAND_b, with distance of 1.0:
    3: BRAND_c, with distance of 1.0:
    4: BRAND_d, with distance of 1.0:
    5: BRAND_e, with distance of 1.0:
    

    import pandas as pd
    df = pd.read_csv('sample.csv')
    print(df.head())
    
    df_mini = df[:5000]
    df_mini = df_mini.transpose()
    df_mini = df_mini.drop('UserID',axis=0)
    
    from sklearn.neighbors import NearestNeighbors
    
    model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
    model_knn.fit(df_mini)
    
    query_index = np.random.choice(df_mini.shape[0])
    distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
    
    for i in range(0, len(distances.flatten())):
        if i == 0:
            print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
        else:
            print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
    

    示例数据文件: https://drive.google.com/open?id=19KRJDGrsLNpDD0WNAz4be76O66fGmQtJ

    0 回复  |  直到 6 年前