代码之家 › 专栏 › 技术社区 › Bushra Jabeen

计算列中的互信息

scikit-learn

Bushra Jabeen · 技术社区 · 2 年前

I have been set a sample exercise by my teacher.  It is to reduce dimensionality by writing a function that uses sklearn(mutual information).I am not that good in it but I tried many ways. Its not giving me any reliable answer even. I am unable to find out the mistake.

The data consists of 19 columns that i got with one hot encoding. And i named it as dummy. whenever i run the code it does not give me any output. neither error nor result.

首先,我不确定该设置什么阈值。第二,如何从sklearn调用互信息源,并迭代一对中的每一列,从高度相关的列对中删除一列。

    Address_A   Address_B   Address_C   Address_D   Address_E   Address_F   Address_G   Address_H   DoW_0   DoW_1   DoW_2   DoW_3   DoW_4   DoW_5   DoW_6   Month_1 Month_11    Month_12    Month_2
        0   0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
        1   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
        2   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
        3   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
        4   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
        ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
        252199  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
        252200  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
        252201  0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
        252202  0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
        252203  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
        
        from sklearn.metrics import mutual_info_score
        
        def reduce_dimentionality(dummy, threshold):
            df_cols = dummy[['Address_A','Address_B','Address_C','Address_D','Address_E','Address_F','Address_G','Address_H',
                             'DoW_0','DoW_1','DoW_2','DoW_3','DoW_4','DoW_5','DoW_6','Month_1','Month_11','Month_12','Month_2']]
            to_remove = [] 
            for col_ix, Address_A in enumerate(df_cols):
                for address_B in df_cols:
                    calc_MI=sklearn.metrics.mutual_info_score
                    mu_info = calc_MI(dummy['Address_A'],dummy['Address_B'], bins=20)
                    if mu_info <1:
                        d=to_remove.append(Address_A)
            new_data_frame = pd.DataFrame.drop(d)
return new_data_frame

0 回复 | 直到 2 年前

推荐文章

Bushra Jabeen · 计算列中的互信息

2 年前

rkraaijveld · sklearn的Coef。线性回归为无

2 年前

Sherwin R · 随机森林预测错误的输出形状

2 年前

Trinh Hieu · 我想在100%中随机训练60%,剩下的40%在混乱矩阵中测试

2 年前

Gijo george · 如何识别段落中每个句子的情绪

2 年前

Test · 安装Scikit Learn Big Sur M1

3 年前

kukelia · 在自定义转换器内创建新数据帧时,SKlearn管道无法工作

3 年前

Arnoldas Maslovskis · 当需要1d数组时,传递了列向量y。请将y的形状更改为(n_samples),例如使用ravel()

3 年前

Rich · 我可以简化零系数的Lasso Lars运行时吗?

3 年前

Medo · 是否可以将3D图像转换为一个矢量?

6 年前