代码之家  ›  专栏  ›  技术社区  ›  jols

具有唯一ID的文字云[重复]

  •  1
  • jols  · 技术社区  · 7 年前

    我有一个包含两列的数据集:唯一id和注释。

    前任。

    ID  | Text
    a1   This is a test comment.
    a2   Another test comment.
    a3   This is very good
    a4   I like this.
    

    ID  |  Words
    --    
    a1   This
    a1   is
    a1   a
    a1   test
    a1   comment
    a2   Another
    a2   test
    a2   comment
    a3   This
    a3   is
    a3   very
    a3   good.
    

    我希望你能收到我的样品。

    J

    2 回复  |  直到 7 年前
        1
  •  2
  •   Prasanna Nandakumar    7 年前
    > df <- read.table(text='ID  Text
    + a1   "This is a test comment"
    + a2   "Another test comment"
    + a3   "This is very good"
    + a4   "I like this"', header=TRUE, as.is=TRUE)
    > 
    > 
    > library(data.table)
    > dt = data.table(df)
    > dt[,c(Words=strsplit(Text, " ", fixed = TRUE)), by = ID]
        ID   Words
     1: a1    This
     2: a1      is
     3: a1       a
     4: a1    test
     5: a1 comment
     6: a2 Another
     7: a2    test
     8: a2 comment
     9: a3    This
    10: a3      is
    11: a3    very
    12: a3    good
    13: a4       I
    14: a4    like
    15: a4    this
    
        2
  •  1
  •   maller    7 年前

    你可以这样做

    library(tidyverse)
    df<- tribble(
      ~ID, ~Text,
      "a1",   "This is a test comment.",
      "a2",   "Another test comment.",
      "a3",   "This is very good",
      "a4",   "I like this."
    )
    
    split_data <- strsplit(df$Text, " ")
    
    do.call(rbind,
       lapply(seq_along(unique(df$ID)), function(x) {
            cbind(rep(df$ID[x], length(split_data[[x]])), split_data[[x]])
       })
    )