代码之家  ›  专栏  ›  技术社区  ›  moodymudskipper

取消使用带有'dbplyr'和'rpostgres的json列`

  •  0
  • moodymudskipper  · 技术社区  · 5 年前

    我有一个专栏 杰森 表中的字符串 《PostgreSQL》 数据库。

    我想在服务器端取消它的必要性,并且我能够计算出SQL代码来实现这一点。我面临的挑战是,我希望能够将此操作插入 dbplyr 管链,即更新 tbl_lazy 对象,但尚未运行查询,我不了解 二羟苯丙酸 足够了。

    见下例:

    安装程序

    library("RPostgres")
    library("dplyr")
    
    drv <- RPostgres::dbDriver("Postgres")
    #### NOT REPRODUCIBLE!!! ####
    con <- RPostgres::dbConnect(drv, dbname = mydbname, sslmode = 'require',
                                host = myhost, port = 5432,
                                user = user, password = mypassword)
    #############################
    
    my_tbl <- tribble(~a, ~bcd,
            1, '{"b": "foo1", "c": "bar1", "d": "baz1"}',
            2, '{"b": "foo2", "c": "bar2", "d": "baz2"}')
    
    copy_to(con, my_tbl, "my_tbl",
            temporary = TRUE)
    

    部分解决方案(无延迟评估)

    unnest_json <-function(data, json_col, ...){
      # build character vector whose names are cols to be created and values columns
      # to be extracted 
      dots <- sapply(as.list(substitute(list(...)))[-1], as.character)
      json_col <- as.character(substitute(json_col))
      # json extraction string
      query0  <- sprintf("%s::json->'%s' as %s",json_col, dots, names(dots))
      # complete query
      query <- sprintf("SELECT *, %s FROM (%s) AS PREV", 
                       paste(query0, collapse = ", "), 
                       dbplyr::sql_render(data))
      # fetch (when I'd rather update the tbl_lazy object instead)
      dbGetQuery(data$src$con, query)
    }
    
    con %>%
      tbl("my_tbl") %>%
      unnest_json(bcd, unnested_b = "b", unnested_c = "c")
    #   a                                     bcd unnested_b unnested_c
    # 1 1 {"b": "foo1", "c": "bar1", "d": "baz1"}     "foo1"     "bar1"
    # 2 2 {"b": "foo2", "c": "bar2", "d": "baz2"}     "foo2"     "bar2"
    

    所需功能

    我希望能够做到,例如:

    con %>%
      tbl("my_tbl") %>%
      unnest_json(bcd, unnested_b = "b", unnested_c = "c") %>% # not evaluated at this point 
      select(-bcd) %>%
      head(1) %>%
      collect() 
    
    #   a unnested_b unnested_c
    # 1 1     "foo1"     "bar1"
    
    0 回复  |  直到 5 年前
        1
  •  1
  •   moodymudskipper    5 年前

    诀窍是使用函数 sql 打电话给 tbl

    SQL 应为包含查询的字符串提供。

    所以函数变成:

    unnest_json <-function(.data,.json_col, ...){
      # build character vector whose names are cols to be created and values columns
      # to be extracted 
      dots <- sapply(as.list(substitute(list(...)))[-1], as.character)
      .json_col <- as.character(substitute(.json_col))
      query0  <- sprintf("%s::json ->>'%s' as %s", .json_col, dots, names(dots))
      query <- sprintf("SELECT *, %s FROM (%s) AS PREV", 
                       paste(query0, collapse = ", "), 
                       dbplyr::sql_render(.data))
      tbl(.data$src$con, sql(query))
    }
    

    我也变了 -> ->> 在查询中获取正确的输出。

    输出 unnest_json :

    con %>%
      tbl("my_tbl") %>%
      unnest_json(bcd, unnested_b = "b", unnested_c = "c")
    # # Source:   SQL [?? x 4]
    # # Database: postgres [standtasic@adbsg@adbsg.postgres.database.azure.com:5432/standtasicdb]
    #       a bcd                                                   unnested_b unnested_c
    #   <dbl> <chr>                                                 <chr>      <chr>     
    # 1     1 "{\"b\": \"foo1\", \"c\": \"bar1\", \"d\": \"baz1\"}" foo1       bar1      
    # 2     2 "{\"b\": \"foo1\", \"c\": \"bar1\", \"d\": \"baz1\"}" foo1       bar1 
    

    用于 dbplyr 链和收集:

    con %>%
      tbl("my_tbl") %>%
      unnest_json(bcd, unnested_b = "b", unnested_c = "c") %>%
      select(-bcd) %>%
      head(1) %>%
      collect()
    # # A tibble: 1 x 3
    #         a unnested_b unnested_c
    #     <dbl> <chr>      <chr>
    #   1     1 foo1       bar1