代码之家  ›  专栏  ›  技术社区  ›  Borbag

在训练CNN进行图像分割时,我的损失怎么会突然增加?

  •  1
  • Borbag  · 技术社区  · 7 年前

    我使用keras 1.2.2和tensorflow 1.4.0后端。

    我使用的是unet架构,我有708张650x650像素的图像和6个通道。我用镜像和旋转来扩充数据集,总共有4248幅图像。

    我有两个类,我的损失函数是:

    def jaccard_coef_loss(y_true, y_pred):
        smooth = 1e-12
        intersection = K.sum(y_true * y_pred, axis=[0, -1, -2])
        sum_ = K.sum(y_true + y_pred, axis=[0, -1, -2])
        jac = (intersection + smooth) / (sum_ - intersection + smooth)
        return 1 - K.mean(jac)
    

    我的优化器:

    optimizer = SGD(lr=0.01, momentum=0.9, nesterov=True)
    

    我有一个大约占图像总数30%的验证集,batch\u大小为4,shuffle设置为True。该模型在每个历元遍历每个训练图像。计划了200个时代,但如果10个时代的验证集没有改进,学习将停止。

    这是最后一个时代的训练记录

    Epoch 10/200
    4248/4248 [==============================] - 3192s - loss: 0.1388 - acc: 0.0868 - jaccard_coef: 0.8612 - jaccard_coef_int: 0.8613 - val_loss: 0.2957 - val_acc: 0.0536 - val_jaccard_coef: 0.7043 - val_jaccard_coef_int: 0.7043
    Epoch 11/200
    4248/4248 [==============================] - 3167s - loss: 0.1375 - acc: 0.0901 - jaccard_coef: 0.8625 - jaccard_coef_int: 0.8626 - val_loss: 0.2968 - val_acc: 0.0632 - val_jaccard_coef: 0.7032 - val_jaccard_coef_int: 0.7033
    Epoch 12/200
    4248/4248 [==============================] - 3272s - loss: 0.1964 - acc: 0.1084 - jaccard_coef: 0.8036 - jaccard_coef_int: 0.8037 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2793e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 13/200
    4248/4248 [==============================] - 3112s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 4.6290e-15 - jaccard_coef_int: 5.5532e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 14/200
    4248/4248 [==============================] - 2032s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.5857e-15 - jaccard_coef_int: 5.1207e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 15/200
    4248/4248 [==============================] - 2260s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6600e-15 - jaccard_coef_int: 5.0932e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 16/200
    4248/4248 [==============================] - 2914s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3220e-15 - jaccard_coef_int: 4.8916e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 17/200
    4248/4248 [==============================] - 2928s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.6034e-15 - jaccard_coef_int: 6.3645e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 18/200
    4248/4248 [==============================] - 2738s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 2.3913e-15 - jaccard_coef_int: 4.7182e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    Epoch 19/200
    4248/4248 [==============================] - 2922s - loss: 1.0000 - acc: 0.5089 - jaccard_coef: 6.2745e-15 - jaccard_coef_int: 5.0041e-18 - val_loss: 1.0000 - val_acc: 0.5066 - val_jaccard_coef: 1.2659e-15 - val_jaccard_coef_int: 4.7833e-18
    

    我不知道12到13岁之间发生了什么。是我的错还是有一个已知的bug可以通过升级到更新版本的keras/tf来修复?

    1 回复  |  直到 7 年前
        1
  •  1
  •   Shai    7 年前

    似乎您的优化过程出现了分歧:可能您得到了非常大的梯度,导致您的模型预测垃圾。尝试将学习率降低到0.001,并从第12次迭代开始恢复