代码之家  ›  专栏  ›  技术社区  ›  Arnold

AUC与GridSearchCV AUC有何不同?

  •  1
  • Arnold  · 技术社区  · 7 年前

    我正在sci kit learn中构建MLPC分类器模型。我使用gridSearchCV和roc\u auc对模型进行评分。训练和测试的平均分数在0.76左右,不错。的输出 cv_results_ 是:

    Train set AUC:  0.553465272412
    Grid best score (AUC):  0.757236688092
    Grid best parameter (max. AUC):  {'hidden_layer_sizes': 10}
    
    {   'mean_fit_time': array([63.54, 136.37, 136.32, 119.23, 121.38, 124.03]),
        'mean_score_time': array([ 0.04,  0.04,  0.04,  0.05,  0.05,  0.06]),
        'mean_test_score': array([ 0.76,  0.74,  0.75,  0.76,  0.76,  0.76]),
        'mean_train_score': array([ 0.76,  0.76,  0.76,  0.77,  0.77,  0.77]),
        'param_hidden_layer_sizes': masked_array(data = [5 (5, 5) (5, 10) 10 (10, 5) (10, 10)],
                 mask = [False False False False False False],
           fill_value = ?)
    ,
        'params': [   {'hidden_layer_sizes': 5},
                      {'hidden_layer_sizes': (5, 5)},
                      {'hidden_layer_sizes': (5, 10)},
                      {'hidden_layer_sizes': 10},
                      {'hidden_layer_sizes': (10, 5)},
                      {'hidden_layer_sizes': (10, 10)}],
        'rank_test_score': array([   2,    6,    5,    1,    4,    3]),
        'split0_test_score': array([ 0.76,  0.75,  0.75,  0.76,  0.76,  0.76]),
        'split0_train_score': array([ 0.76,  0.75,  0.75,  0.76,  0.76,  0.76]),
        'split1_test_score': array([ 0.77,  0.76,  0.76,  0.77,  0.76,  0.76]),
        'split1_train_score': array([ 0.76,  0.75,  0.75,  0.76,  0.76,  0.76]),
        'split2_test_score': array([ 0.74,  0.72,  0.73,  0.74,  0.74,  0.75]),
        'split2_train_score': array([ 0.77,  0.77,  0.77,  0.77,  0.77,  0.77]),
        'std_fit_time': array([47.59,  1.29,  1.86,  3.43,  2.49,  9.22]),
        'std_score_time': array([ 0.01,  0.01,  0.01,  0.00,  0.00,  0.01]),
        'std_test_score': array([ 0.01,  0.01,  0.01,  0.01,  0.01,  0.01]),
        'std_train_score': array([ 0.01,  0.01,  0.01,  0.01,  0.01,  0.00])}
    

    正如你所看到的,我使用了3的KFold。有趣的是,人工计算的列车集roc\u auc\u分数报告为0.55,而平均列车分数报告为约0.76。生成此输出的代码是:

    def model_mlp (X_train, y_train, verbose=True, random_state = 42):
        grid_values = {'hidden_layer_sizes': [(5), (5,5), (5, 10),
                                              (10), (10, 5), (10, 10)]}
    
        # MLP requires scaling of all predictors
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
    
        mlp = MLPClassifier(solver='adam', learning_rate_init=1e-4,
                            max_iter=200,
                            verbose=False,
                            random_state=random_state)
        # perform the grid search
        grid_auc = GridSearchCV(mlp, 
                                param_grid=grid_values,
                                scoring='roc_auc', 
                                verbose=2, n_jobs=-1)
        grid_auc.fit(X_train, y_train)
        y_hat = grid_auc.predict(X_train)
    
        # print out the results
        if verbose:
            print('Train set AUC: ', roc_auc_score(y_train, y_hat))
            print('Grid best score (AUC): ', grid_auc.best_score_)
            print('Grid best parameter (max. AUC): ', grid_auc.best_params_)
            print('')
            pp = pprint.PrettyPrinter(indent=4)
            pp.pprint (grid_auc.cv_results_)
            print ('MLPClassifier fitted, {:.2f} seconds used'.format (time.time () - t))
    
        return grid_auc.best_estimator_
    

    由于这一差异,我决定“效仿” GridSearchCV 并得到以下结果:

    Shape X_train: (107119, 15)
    Shape y_train: (107119,)
    Shape X_val: (52761, 15)
    Shape y_val: (52761,)
           layers    roc-auc
      Seq  l1  l2  train   test iters runtime
        1   5   0 0.5522 0.5488    85   20.54
        2   5   5 0.5542 0.5513    80   27.10
        3   5  10 0.5544 0.5521    83   28.56
        4  10   0 0.5532 0.5516    61   15.24
        5  10   5 0.5540 0.5518    54   19.86
        6  10  10 0.5507 0.5474    56   21.09
    

    分数都在0.55左右,与上面代码中的手动计算一致。令我惊讶的是结果中没有变化。似乎我犯了一些错误,但我找不到,请参阅代码:

    def simple_mlp (X, y, verbose=True, random_state = 42):
        def do_mlp (X_t, X_v, y_t, y_v, n, l1, l2=None):
            if l2 is None:
                layers = (l1)
                l2 = 0
            else:
                layers = (l1, l2)
    
            t = time.time ()
            mlp = MLPClassifier(solver='adam', learning_rate_init=1e-4,
                                hidden_layer_sizes=layers,
                                max_iter=200,
                                verbose=False,
                                random_state=random_state)
            mlp.fit(X_t, y_t)
            y_hat_train = mlp.predict(X_t)
            y_hat_val = mlp.predict(X_v)
            if verbose:
                av = 'samples'
                acc_trn = roc_auc_score(y_train, y_hat_train, average=av)
                acc_tst = roc_auc_score(y_val, y_hat_val, average=av)
                print ("{:5d}{:4d}{:4d}{:7.4f}{:7.4f}{:9d}{:8.2f}"
                       .format(n, l1, l2, acc_trn, acc_tst,  mlp.n_iter_, time.time() - t))
            return mlp, n + 1
    
        X_train, X_val, y_train, y_val = train_test_split (X, y, test_size=0.33, random_state=random_state)
        if verbose:
            print('Shape X_train:', X_train.shape)
            print('Shape y_train:', y_train.shape)
            print('Shape X_val:', X_val.shape)
            print('Shape y_val:', y_val.shape)
    
        # MLP requires scaling of all predictors
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_val = scaler.transform(X_val)
    
        n = 1
        layers1 = [5, 10]
        layers2 = [5, 10]
        if verbose:
            print ("       layers    roc-auc")
            print ("  Seq  l1  l2  train validation iters runtime")
        for l1 in layers1:
            mlp, n = do_mlp (X_train, X_val, y_train, y_val, n, l1)
            for l2 in layers2:
                mlp, n = do_mlp (X_train, X_val, y_train, y_val, n, l1, l2)
    
        return mlp
    

    cv=3 (默认)用于 GridSearchCV 在寻找可能的答案时,我找到了 this post on SO 这描述了同样的问题。没有人回答。也许有人知道到底发生了什么?

    谢谢你抽出时间。

    编辑

    np.random.seed (random_state)
    index = np.random.permutation (len(X_train))
    X_train = X_train.iloc[index]
    

    并转换为simple_mlp,以替代train_test_split:

    np.random.seed (random_state)
    index = np.random.permutation (len(X))
    X = X.iloc[index]
    y = y.iloc[index]
    train_size = int (2 * len(X) / 3.0) # sample of 2 third
    X_train = X[:train_size]
    X_val = X[train_size:]
    y_train = y[:train_size]
    y_val = y[train_size:]
    

    这导致了以下输出:

    Train set AUC:  0.5
    Grid best score (AUC):  0.501410198106
    Grid best parameter (max. AUC):  {'hidden_layer_sizes': (5, 10)}
    
    {   'mean_fit_time': array([28.62, 46.00, 54.44, 46.74, 55.25, 53.33]),
        'mean_score_time': array([ 0.04,  0.05,  0.05,  0.05,  0.05,  0.06]),
        'mean_test_score': array([ 0.50,  0.50,  0.50,  0.50,  0.50,  0.50]),
        'mean_train_score': array([ 0.50,  0.51,  0.51,  0.51,  0.50,  0.51]),
        'param_hidden_layer_sizes': masked_array(data = [5 (5, 5) (5, 10) 10 (10, 5) (10, 10)],
                 mask = [False False False False False False],
           fill_value = ?)
    ,
        'params': [   {'hidden_layer_sizes': 5},
                      {'hidden_layer_sizes': (5, 5)},
                      {'hidden_layer_sizes': (5, 10)},
                      {'hidden_layer_sizes': 10},
                      {'hidden_layer_sizes': (10, 5)},
                      {'hidden_layer_sizes': (10, 10)}],
        'rank_test_score': array([   6,    2,    1,    4,    5,    3]),
        'split0_test_score': array([ 0.50,  0.50,  0.51,  0.50,  0.50,  0.50]),
        'split0_train_score': array([ 0.50,  0.51,  0.50,  0.51,  0.50,  0.51]),
        'split1_test_score': array([ 0.50,  0.50,  0.50,  0.50,  0.49,  0.50]),
        'split1_train_score': array([ 0.50,  0.50,  0.51,  0.50,  0.51,  0.51]),
        'split2_test_score': array([ 0.49,  0.50,  0.49,  0.50,  0.50,  0.50]),
        'split2_train_score': array([ 0.51,  0.51,  0.51,  0.51,  0.50,  0.51]),
        'std_fit_time': array([19.74, 19.33,  0.55,  0.64,  2.36,  0.65]),
        'std_score_time': array([ 0.01,  0.01,  0.00,  0.01,  0.00,  0.01]),
        'std_test_score': array([ 0.01,  0.00,  0.01,  0.00,  0.00,  0.00]),
        'std_train_score': array([ 0.00,  0.00,  0.00,  0.00,  0.00,  0.00])}
    

    然而,我有一些疑问。在最初的设置中,GridSearchCV始终过高,约为0.20,而现在它始终过低,约为0.05。这是一种改进,因为两种方法的偏差都减少了4倍。是否有对上一个发现的解释,或者两种方法之间约0.05的偏差仅仅是噪声的事实?我决定把这个标记为正确的答案,但我希望有人能对我的这一点疑问有所帮助。

    1 回复  |  直到 7 年前
        1
  •  1
  •   Gambit1614    7 年前

    分数的差异主要是由于数据集的分割方式不同 GridSearchCV 以及模拟它的功能。这样想吧。假设数据集中有9个数据点。现在在具有3倍的GridSearchCV中,假设分布如下:

    train_cv_fold1_indices : 1 2 3 4 5 6 
    test_cv_fold1_indices  : 7 8 9
    
    
    train_cv_fold2_indices : 1 2 3 7 8 9 
    test_cv_fold2_indices  : 4 5 6
    
    
    train_cv_fold3_indices : 4 5 6 7 8 9 
    test_cv_fold3_indices  : 1 2 3
    

    train_indices : 1 3 5 7 8 9
    test_indices  : 2 4 6
    

    现在,正如您所看到的,这是数据集上的不同分割,因此在其上训练的分类器可能表现得非常不同。(它甚至可能表现相同,这完全取决于数据点和各种其他因素,如相关性如何,它们是否有助于检查数据点之间的变化等)。

    因此,为了完美地模拟GridSearchCV,您需要以相同的方式执行拆分。

    检查 GridSearchCV Source 你们会发现,在第592行,为了执行CV,他们调用另一个函数 check_cv at this link . 它实际上调用了 Kfold CV startified CV .

    Kfold CV 标准化CV ). 然后在仿真函数中使用相同的CV对象,以获得更具可比性的分析。然后您可能会得到更多相关值。