代码之家  ›  专栏  ›  技术社区  ›  ScalaBoy

对于不平衡的多分类问题,如何在Keras中设计合适的模型?

  •  1
  • ScalaBoy  · 技术社区  · 6 年前

    我正在解决序列数据的不平衡多分类(4类)问题。我准备了培训和测试集,每个课程包含相同数量的记录:

    • 100100100100->训练集
    • 20,20,20,20->测试装置

    我得到了我的LSTM Keras模型的以下验证结果。他们太糟糕了。在混淆矩阵中可以看出,所有记录都被归类为4类。

    ****************************
    | MODEL PERFORMANCE REPORT |
    ****************************
    Average F1 score = 0.10.
    Balanced accuracy score = 0.25.
    Confusion matrix
    [[ 0  0  0 20]
     [ 0  0  0 20]
     [ 0  0  0 20]
     [ 0  0  0 20]]
    Other metrics
                  precision    recall  f1-score   support
    
               0       0.00      0.00      0.00        20
               1       0.00      0.00      0.00        20
               2       0.00      0.00      0.00        20
               3       0.25      1.00      0.40        20
    
       micro avg       0.25      0.25      0.25        80
       macro avg       0.06      0.25      0.10        80
    weighted avg       0.06      0.25      0.10        80
    

    我不想继续进行超参数优化,因为我的模型似乎存在根本性的错误。

    如果在LSTM和深度学习方面更有经验的人能指出我的错误,我将不胜感激。

    这是我的数据(我使用一个非常小的样本来试验一个基本模型,稍后我将对整个数据集进行训练):

    400 train sequences
    80 test sequences
    X_train shape: (400, 20, 17)
    X_test shape: (80, 20, 17)
    y_train shape: (400, 4)
    y_test shape: (80, 4)
    

    这是我的模型和拟合函数:

    hidden_neurons = 50
    timestamps = 20
    nb_features = 18
    
    model = Sequential()
    
    model.add(LSTM(
                    units=hidden_neurons,
                    return_sequences=True, 
                    input_shape=(timestamps,nb_features),
                    dropout=0.2, 
                    recurrent_dropout=0.2
                  )
             )
    
    model.add(TimeDistributed(Dense(1)))
    
    model.add(Dropout(0.2))
    
    model.add(Flatten())
    
    model.add(Dense(units=nb_classes,
                   activation='softmax'))
    
    model.compile(loss="categorical_crossentropy",metrics = ['accuracy'],optimizer='adadelta')
    
    history = model.fit(np.array(X_train), y_train, 
                        validation_data=(np.array(X_test), y_test),
                        epochs=50,
                        batch_size=2,
                        callbacks=[model_metrics],
                        shuffle=False,
                        verbose=1)
    
    
    class Metrics(Callback):
    
        def on_train_begin(self, logs={}):
            self.val_f1s = []
            self.val_recalls = []
            self.val_precisions = []
    
        def on_epoch_end(self, epoch, logs={}):
            val_predict = np.argmax((np.asarray(self.model.predict(self.validation_data[0]))).round(), axis=1)
            val_targ = np.argmax(self.validation_data[1], axis=1)
            _val_f1 = metrics.f1_score(val_targ, val_predict, average='weighted')
            _val_recall = metrics.recall_score(val_targ, val_predict, average='weighted')
            _val_precision = metrics.precision_score(val_targ, val_predict, average='weighted')
            self.val_f1s.append(_val_f1)
            self.val_recalls.append(_val_recall)
            self.val_precisions.append(_val_precision)
            print(" — val_f1: {:f} — val_precision: {:f} — val_recall {:f}".format(_val_f1, _val_precision, _val_recall))
            return
    
    model_metrics = Metrics()
    

    enter image description here

    enter image description here

    0 回复  |  直到 6 年前