代码之家 › 专栏 › 技术社区 › jols

具有唯一ID的文字云[重复]

tableau-api r

jols · 技术社区 · 7 年前

我有一个包含两列的数据集:唯一id和注释。

前任。

ID  | Text
a1   This is a test comment.
a2   Another test comment.
a3   This is very good
a4   I like this.

ID  |  Words
--    
a1   This
a1   is
a1   a
a1   test
a1   comment
a2   Another
a2   test
a2   comment
a3   This
a3   is
a3   very
a3   good.

我希望你能收到我的样品。

2 回复 | 直到 7 年前

Prasanna Nandakumar 7 年前

> df <- read.table(text='ID  Text
+ a1   "This is a test comment"
+ a2   "Another test comment"
+ a3   "This is very good"
+ a4   "I like this"', header=TRUE, as.is=TRUE)
> 
> 
> library(data.table)
> dt = data.table(df)
> dt[,c(Words=strsplit(Text, " ", fixed = TRUE)), by = ID]
    ID   Words
 1: a1    This
 2: a1      is
 3: a1       a
 4: a1    test
 5: a1 comment
 6: a2 Another
 7: a2    test
 8: a2 comment
 9: a3    This
10: a3      is
11: a3    very
12: a3    good
13: a4       I
14: a4    like
15: a4    this

maller 7 年前

你可以这样做

library(tidyverse)
df<- tribble(
  ~ID, ~Text,
  "a1",   "This is a test comment.",
  "a2",   "Another test comment.",
  "a3",   "This is very good",
  "a4",   "I like this."
)

split_data <- strsplit(df$Text, " ")

do.call(rbind,
   lapply(seq_along(unique(df$ID)), function(x) {
        cbind(rep(df$ID[x], length(split_data[[x]])), split_data[[x]])
   })
)

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前