I have been set a sample exercise by my teacher. It is to reduce dimensionality by writing a function that uses sklearn(mutual information).I am not that good in it but I tried many ways. Its not giving me any reliable answer even. I am unable to find out the mistake.
The data consists of 19 columns that i got with one hot encoding. And i named it as dummy. whenever i run the code it does not give me any output. neither error nor result.
首先,我不确定该设置什么阈值。
第二,如何从sklearn调用互信息源,并迭代一对中的每一列,从高度相关的列对中删除一列。
Address_A Address_B Address_C Address_D Address_E Address_F Address_G Address_H DoW_0 DoW_1 DoW_2 DoW_3 DoW_4 DoW_5 DoW_6 Month_1 Month_11 Month_12 Month_2
0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
1 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
252199 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252200 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252201 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252202 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252203 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
from sklearn.metrics import mutual_info_score
def reduce_dimentionality(dummy, threshold):
df_cols = dummy[['Address_A','Address_B','Address_C','Address_D','Address_E','Address_F','Address_G','Address_H',
'DoW_0','DoW_1','DoW_2','DoW_3','DoW_4','DoW_5','DoW_6','Month_1','Month_11','Month_12','Month_2']]
to_remove = []
for col_ix, Address_A in enumerate(df_cols):
for address_B in df_cols:
calc_MI=sklearn.metrics.mutual_info_score
mu_info = calc_MI(dummy['Address_A'],dummy['Address_B'], bins=20)
if mu_info <1:
d=to_remove.append(Address_A)
new_data_frame = pd.DataFrame.drop(d)
return new_data_frame