代码之家  ›  专栏  ›  技术社区  ›  Franck Dernoncourt

将-train_batch_size 2增加到-train_batch_size 3会导致Mozilla DeepSpeech不再训练。为什么?

  •  0
  • Franck Dernoncourt  · 技术社区  · 6 年前

    增加的 --train_batch_size 2 --train_batch_size 3 导致Mozilla DeepSpeech不再训练。有什么可以解释?


    特别是,如果我跑

    ./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv --dev_files \
     data/common-voice-v1/cv-valid-dev.csv \
    --test_files data/common-voice-v1/cv-valid-test.csv  \
     --log_level 0 --limit_train 10000 --train_batch_size 2 --train True
    

    我明白了 set_name: train 以下内容:

    D Starting queue runners...
    D Queue runners started.
    I STARTING Optimization
    D step: 77263
    D epoch: 61
    D target epoch: 75
    D steps per epoch: 1250
    D number of batches in train set: 5000
    D batches per job: 4
    D batches per step: 4
    D number of jobs in train set: 1250
    D number of jobs already trained in first epoch: 1013
    D Computing Job (ID: 2, worker: 0, epoch: 61, set_name: train)...
    D Starting batch...
    D Finished batch step 77264.
    D Sending Job (ID: 2, worker: 0, epoch: 61, set_name: train)...
    D Computing Job (ID: 3, worker: 0, epoch: 61, set_name: train)...
    D Starting batch...
    D Finished batch step 77265.
    D Sending Job (ID: 3, worker: 0, epoch: 61, set_name: train)...
    D Computing Job (ID: 4, worker: 0, epoch: 61, set_name: train)...
    D Starting batch...
    D Finished batch step 77266.
    D Sending Job (ID: 4, worker: 0, epoch: 61, set_name: train)...
    [...]
    

    但是,如果我运行:

    ./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv --dev_files \
     data/common-voice-v1/cv-valid-dev.csv \
    --test_files data/common-voice-v1/cv-valid-test.csv  \
     --log_level 0 --limit_train 10000 --train_batch_size 3 --train True
    

    我明白了 set_name: test 以下内容:

    D Starting queue runners...
    D Queue runners started.
    D step: 77263
    D epoch: 92
    D target epoch: 75
    D steps per epoch: 833
    D number of batches in train set: 3334
    D batches per job: 4
    D batches per step: 4
    D number of jobs in train set: 833
    D number of jobs already trained in first epoch: 627
    D Computing Job (ID: 2, worker: 0, epoch: 92, set_name: test)...
    D Starting batch...
    D Finished batch step 77263.
    D Sending Job (ID: 2, worker: 0, epoch: 92, set_name: test)...
    D Computing Job (ID: 3, worker: 0, epoch: 92, set_name: test)...
    D Starting batch...
    D Finished batch step 77263.
    D Sending Job (ID: 3, worker: 0, epoch: 92, set_name: test)...
    D Computing Job (ID: 4, worker: 0, epoch: 92, set_name: test)...
    D Starting batch...
    D Finished batch step 77263.
    D Sending Job (ID: 4, worker: 0, epoch: 92, set_name: test)...
    D Computing Job (ID: 5, worker: 0, epoch: 92, set_name: test)...
    D Starting batch...
    D Finished batch step 77263.
    D Sending Job (ID: 5, worker: 0, epoch: 92, set_name: test)...
    [...]
    

    我使用4 nvidia geforce gtx 1080训练mozilla deepspeech。

    1 回复  |  直到 6 年前
        1
  •  0
  •   Franck Dernoncourt    6 年前

    正如 lissyx ,是检查点目录未清理。这在问题细节的日志中很明显,例如。 D Finished batch step 77263. ,而如果清除了检查点目录,则批处理步骤应为 0 .因此,当与 --train_batch_size > 2 ,它直接跳到了测试阶段。

    ubuntu上检查点目录的默认位置是: /home/[username]/.local/share/deepspeech/checkpoints 是的。可以使用 --checkpoint_dir 争论。

    推荐文章