代码之家  ›  专栏  ›  技术社区  ›  generic_user

当我尝试并行拟合多个模型时,为什么tensorflow/keras会阻塞?

  •  6
  • generic_user  · 技术社区  · 6 年前

    但当我尝试为不同的模型设定不同的学习速率时 平行foreach循环的内部 整件事令人窒息。

    怎么回事?我怀疑这与作用域有关——工作人员可能没有在tensorflow的单独实例上运行。但我真的不知道。我怎样才能做到这一点?我需要了解什么才能知道为什么这不起作用?

    foreach 循环到 %do% 而且效果很好。设置为 %dopar% 它在舞台上窒息了。

    library(foreach)
    library(doParallel)
    registerDoParallel(2)
    library(keras)
    library(tensorflow)
    mnist <- dataset_mnist()
    x_train <- mnist$train$x
    y_train <- mnist$train$y
    x_test <- mnist$test$x
    y_test <- mnist$test$y
    
    x_train <- array_reshape(x_train, c(nrow(x_train), 784))
    x_test <- array_reshape(x_test, c(nrow(x_test), 784))
    # rescale
    x_train <- x_train / 255
    x_test <- x_test / 255
    
    y_train <- to_categorical(y_train, 10)
    y_test <- to_categorical(y_test, 10)
    
    # make tensorflow run single-threaded
    session_conf <- tf$ConfigProto(intra_op_parallelism_threads = 1L,
                                   inter_op_parallelism_threads = 1L)
    # Create the session using the custom configuration
    sess <- tf$Session(config = session_conf)
    K <- backend()
    K$set_session(sess)
    
    
    models <- foreach(i = 1:2) %dopar%{
      model <- keras_model_sequential() 
      model %>% 
        layer_dense(units = 256/i, activation = 'relu', input_shape = c(784)) %>% 
        layer_dropout(rate = 0.4) %>% 
        layer_dense(units = 128/i, activation = 'relu') %>%
        layer_dropout(rate = 0.3) %>%
        layer_dense(units = 10, activation = 'softmax')
    
      print("A")
      model %>% compile(
        loss = 'categorical_crossentropy',
        optimizer = optimizer_rmsprop(),
        metrics = c('accuracy')
      )
      print("B")
      history <- model %>% fit(
        x_train, y_train, 
        epochs = 3, batch_size = 128, 
        validation_split = 0.2, verbose = 0
      )
      print("done")  
    }
    

    这里是 sessionInfo()

    R version 3.5.1 (2018-07-02)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 18.04.1 LTS
    
    Matrix products: default
    BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
     [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
     [1] panelNNET_1.0       matrixStats_0.54.0  MASS_7.3-50         lfe_2.8-2           tensorflow_1.9      keras_2.1.6.9005   
     [7] mgcv_1.8-24         nlme_3.1-137        scales_1.0.0        forcats_0.3.0       stringr_1.3.1       purrr_0.2.5        
    [13] readr_1.1.1         tidyr_0.8.1         tibble_1.4.2        tidyverse_1.2.1     maptools_0.9-3      rgeos_0.3-28       
    [19] rgdal_1.3-4         sp_1.3-1            broom_0.5.0         ggplot2_3.0.0       randomForest_4.6-14 dplyr_0.7.6        
    [25] glmnet_2.0-16       Matrix_1.2-14       doBy_4.6-2          doParallel_1.0.11   iterators_1.0.10    foreach_1.4.4      
    
    loaded via a namespace (and not attached):
     [1] httr_1.3.1          jsonlite_1.5        modelr_0.1.2        Formula_1.2-3       assertthat_0.2.0    cellranger_1.1.0   
     [7] yaml_2.2.0          pillar_1.3.0        backports_1.1.2     lattice_0.20-35     glue_1.3.0          reticulate_1.10    
    [13] digest_0.6.15       RcppEigen_0.3.3.4.0 rvest_0.3.2         colorspace_1.3-2    sandwich_2.5-0      plyr_1.8.4         
    [19] pkgconfig_2.0.1     haven_1.1.2         xtable_1.8-2        whisker_0.3-2       withr_2.1.2         lazyeval_0.2.1     
    [25] cli_1.0.0           magrittr_1.5        crayon_1.3.4        readxl_1.1.0        xml2_1.2.0          foreign_0.8-70     
    [31] tools_3.5.1         hms_0.4.2           munsell_0.5.0       bindrcpp_0.2.2      compiler_3.5.1      rlang_0.2.2        
    [37] grid_3.5.1          rstudioapi_0.7      base64enc_0.1-3     labeling_0.3        gtable_0.2.0        codetools_0.2-15   
    [43] R6_2.2.2            tfruns_1.3          zoo_1.8-3           lubridate_1.7.4     zeallot_0.1.0       bindr_0.1.1        
    [49] stringi_1.2.4       Rcpp_0.12.18        tidyselect_0.2.4
    
    2 回复  |  直到 6 年前
        1
  •  5
  •   Daniel GL    6 年前

    Keras要求在给定的课程中只有一次培训。我会尝试为每个模型创建不同的会话。

    我将在%dopar%中插入这部分代码,以便为每个模型创建不同的会话

    sess <- tf$Session(config = session_conf)
    K <- backend()
    K$set_session(sess)
    
        2
  •  2
  •   generic_user    6 年前

    公认的答案是正确的,因为需要一个新的会话来让它工作。

    然而,我发现这样做 foreach

    我的解决方法是写一个脚本,比如 fit_model.R 创建tensorflow会话,加载存储的权重,拟合模型,等等 meta_fit.R . 此脚本包含 循环,但每个工作进程只需 system("Rscript fit_model.csv") . 这样,在每次keras/tensorflow运行之后,在脚本退出之后,操作系统就会清理掉所有遗留的问题。