代码之家  ›  专栏  ›  技术社区  ›  Franck Dernoncourt

Mozilla DeepSpeech中的“纪元[数字]测试”是什么意思?

  •  0
  • Franck Dernoncourt  · 技术社区  · 6 年前

    Mozilla DeepSpeech中的“纪元[数字]测试”是什么意思?

    在下面的例子中,它说 Test of Epoch 77263 即使我的理解只有一个时代 --display_step 1 --limit_train 1 --limit_dev 1 --limit_test 1 --early_stop False --epoch 1 作为参数:

    dernoncourt@ilcomp:~/asr/DeepSpeech$ ./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv,data/common-voice-v1/cv-other-train.csv --dev_files data/common-voice-v1/cv-valid-dev.csv --test_files data/common-voice-v1/cv-valid-test.csv --decoder_library_path /asr/DeepSpeech/libctc_decoder_with_kenlm.so --fulltrace True --display_step 1  --limit_train 1  --limit_dev 1  --limit_test 1 --early_stop False --epoch 1
    W Parameter --validation_step needs to be >0 for early stopping to work
    I Test of Epoch 77263 - WER: 1.000000, loss: 60.50202560424805, mean edit distance: 0.894737
    I --------------------------------------------------------------------------------
    I WER: 1.000000, loss: 58.900837, mean edit distance: 0.894737
    I  - src: "how do you like her"
    I  - res: "i "
    I --------------------------------------------------------------------------------
    I WER: 1.000000, loss: 60.517113, mean edit distance: 0.894737
    I  - src: "how do you like her"
    I  - res: "i "
    I --------------------------------------------------------------------------------
    I WER: 1.000000, loss: 60.668221, mean edit distance: 0.894737
    I  - src: "how do you like her"
    I  - res: "i "
    I --------------------------------------------------------------------------------
    I WER: 1.000000, loss: 61.921925, mean edit distance: 0.894737
    I  - src: "how do you like her"
    I  - res: "i "
    I --------------------------------------------------------------------------------
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Franck Dernoncourt    6 年前

    Explanation by Tilman Kamp 以下内容:

    这实际上不是一个bug,因为当前的epoch是根据 当前参数和快照的 步数。仔细看看这段摘录:

    # Number of GPUs per worker - fixed for now by local reality or cluster setup
    gpus_per_worker = len(available_devices)
    
    # Number of batches processed per job per worker
    batches_per_job  = gpus_per_worker * max(1, FLAGS.iters_per_worker)
    
    # Number of batches per global step
    batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg)
    
    # Number of global steps per epoch - to be at least 1
    steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step)
    
    # The start epoch of our training
    # Number of GPUs per worker - fixed for now by local reality or cluster setup
    gpus_per_worker = len(available_devices)
    
    # Number of batches processed per job per worker
    batches_per_job  = gpus_per_worker * max(1, FLAGS.iters_per_worker)
    
    # Number of batches per global step
    batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg)
    
    # Number of global steps per epoch - to be at least 1
    steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step)
    
    # The start epoch of our training
    self._epoch = step // steps_per_epoch
    

    所以在训练过程中你的设定尺寸与 您当前的设置大小。这就是奇怪的纪元数。

    简化示例(不混淆批大小):如果您曾经培训过 在1000个样本集中有5个阶段,你得到了5000个“全局步骤” (在快照中保留为数字)。经过这次训练你 将命令行参数更改为一组大小为1的参数(您的--limit_* 参数)。”突然“你会看到纪元5000,因为5000 全局步骤意味着应用大小为15000次的数据集。

    外卖:使用 --checkpoint_dir 避免此类问题的论点。