代码之家  ›  专栏  ›  技术社区  ›  Bushra Jabeen

计算列中的互信息

  •  0
  • Bushra Jabeen  · 技术社区  · 2 年前
    I have been set a sample exercise by my teacher.  It is to reduce dimensionality by writing a function that uses sklearn(mutual information).I am not that good in it but I tried many ways. Its not giving me any reliable answer even. I am unable to find out the mistake.
    
    The data consists of 19 columns that i got with one hot encoding. And i named it as dummy. whenever i run the code it does not give me any output. neither error nor result. 
    

    首先,我不确定该设置什么阈值。 第二,如何从sklearn调用互信息源,并迭代一对中的每一列,从高度相关的列对中删除一列。

        Address_A   Address_B   Address_C   Address_D   Address_E   Address_F   Address_G   Address_H   DoW_0   DoW_1   DoW_2   DoW_3   DoW_4   DoW_5   DoW_6   Month_1 Month_11    Month_12    Month_2
            0   0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
            1   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
            2   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
            3   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
            4   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
            ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
            252199  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
            252200  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
            252201  0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
            252202  0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
            252203  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
            
            from sklearn.metrics import mutual_info_score
            
            def reduce_dimentionality(dummy, threshold):
                df_cols = dummy[['Address_A','Address_B','Address_C','Address_D','Address_E','Address_F','Address_G','Address_H',
                                 'DoW_0','DoW_1','DoW_2','DoW_3','DoW_4','DoW_5','DoW_6','Month_1','Month_11','Month_12','Month_2']]
                to_remove = [] 
                for col_ix, Address_A in enumerate(df_cols):
                    for address_B in df_cols:
                        calc_MI=sklearn.metrics.mutual_info_score
                        mu_info = calc_MI(dummy['Address_A'],dummy['Address_B'], bins=20)
                        if mu_info <1:
                            d=to_remove.append(Address_A)
                new_data_frame = pd.DataFrame.drop(d)
    return new_data_frame
    
    0 回复  |  直到 2 年前