代码之家  ›  专栏  ›  技术社区  ›  rane

如何将列表(包含多个元素)转换为字符串而不转换为“c(“xxx”、“xxx”、“xxx”)”R

  •  3
  • rane  · 技术社区  · 6 年前
    library(data.table)
    
    # Target string to convert
    
    DATE_DATA <- c("2015-01-02;2015-01-07;2021-05-02;2019-02-05",
    "2017-08-02;2000-01-22;2003-03-07;2017-10-09",
    "2013-08-02;2022-06-02;2012-03-15")
    
    # Dataset
    DT <- data.table(NAME = c("JOE","MARY","PAUL"),DATE = c(DATE_DATA))
    

    预期结果--在新的列调用“period”中转换日期列,如下所示: 拆分+排序递减=F+唯一年份

    #  period
    1: 2015,2019,2021
    2: 2000,2003,2017
    3: 2012,2013,2022
    

    # 1st approach -- RESULT : created column with class -- "list"
    
    DT[,period:= lapply(strsplit(DT$DATE,";"),
                                     function(x) sort(unique(str_sub(x,1,4)),
                                                      decreasing = FALSE))]
    
    # 2nd approach -- RESULT : created column with class -- "character" but value
    #                          turn to "c("xxx", "xxx", "xxx")" , not expected 
    #                          "xxx,xxx,xxx"
    
    DT[,period:= as.character(paste(lapply(strsplit(DT$DATE,";"),
                                 function(x) sort(unique(str_sub(x,1,4)),
                                                  decreasing = FALSE)),collapse = ","))]
    

    3 回复  |  直到 6 年前
        1
  •  4
  •   Ronak Shah    6 年前

    对于每个 DATE 我们可以分开 列在“;”,将它们转换为日期,使用 format toString .

    DT$Period <- sapply(DT$DATE, function(x) 
             toString(sort(unique(format(as.Date(strsplit(x, ";")[[1]]), "%Y")))))
    DT
    
    #   NAME                                        DATE           Period
    #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
    #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
    #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022
    

    我们可以减少 as.Date 使用 lubridate 提供相同输出的包。

    library(lubridate)
    DT$Period <- sapply(DT$DATE, function(x) 
                       toString(sort(unique(year(strsplit(x, ";")[[1]])))))
    

    我不是一个 data.table 但我认为你在尝试中缺少的是分组( by 日期 列中,需要指定 unique 通过 争论。

    DT[,period:= paste(sapply(strsplit(DATE,";"),
      function(x) sort(unique(substr(x,1,4)),)),collapse = ","), by = 1:nrow(DT)]
    
    DT
    
    #   NAME                                        DATE         period
    #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015,2019,2021
    #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000,2003,2017
    #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012,2013,2022
    
        2
  •  2
  •   akrun    6 年前

    我们可以用 gsub scan

    DT[,  Period := toString(sort(unique(scan(text=gsub("-\\d+", 
                   "", DATE), what = numeric(), sep=";")))), NAME]
    DT
    #   NAME                                        DATE           Period
    #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
    #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
    #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022
    

    tidyverse ,在这里我们通过在 ; summarise “期间”作为 sort 预计起飞时间 year 已转换的 Date 班级( ymd select 将列按适当顺序排列(如果需要)

    library(tidyverse)
    DT %>% 
       separate_rows(DATE, sep = ";") %>% 
       group_by(NAME) %>% 
       summarise(Period = toString(sort(unique(year(ymd(DATE)))))) %>% 
       right_join(DT) %>%
       select(names(DT), everything())
    # A tibble: 3 x 3
    #  NAME  DATE                                        Period                
    #  <chr> <chr>                                       <chr>                 
    #1 JOE   2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
    #2 MARY  2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
    #3 PAUL  2013-08-02;2022-06-02;2012-03-15            2012, 2013, 2022    
    
        3
  •  1
  •   ira    6 年前

    我不确定最快的方法是什么,但相对容易阅读和理解的方法是:

    DT[, period:=sapply(strsplit(DATE, ";"), 
         function(x) paste(sort(unique(year(as.Date(x)))), collapse = ","))]
    

    结果输出为:

       NAME                                        DATE         period
    1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015,2019,2021
    2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000,2003,2017
    3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012,2013,2022
    

    strsplit(DATE, ";")