代码之家 › 专栏 › 技术社区 › geoscience123

如何计算R中一个表列与另一个数据帧的匹配数?

match tidyverse list r

geoscience123 · 技术社区 · 6 天前

我有两组数据:

第一个数据帧( small )相对小于第二数据帧( large ). 每个数据帧都有一个 id 具有唯一标识符的列。较小的数据帧有一个名为 links ,其中包含到较大的第二数据帧的链接列表。较大的数据帧有一列属性,我们将调用 att :

library(tidyverse)

a <- c(3, 3, NA, 5)
b <- c(NA, 3, 4, 5)

small <- tibble(id = c(1, 2),
                links = list(a, b))

large <- tibble(id = c(3, 4, 5),
                att = c("yes", "no", "maybe"))

我的目标是统计每次观察的次数 小的 数据帧与观测值有联系 "yes" 属性在 大的 数据帧。

我觉得这样的事情是在正确的轨道上,但它并不完全正确:

counted <- small %>%
  mutate(count_yes = map_int(links, ~ sum(large$att[large$id %in% .x] == "yes")))

print(counted)
#> # A tibble: 2 Ã 3
#>      id links     count_yes
#>   <dbl> <list>        <int>
#> 1     1 <dbl [4]>         1
#> 2     2 <dbl [4]>         1

在这里, count_yes 当它应该读作2和1时,它只显示为1。

2 回复 | 直到 6 天前

Ronak Shah 6 天前

你走在正确的道路上,但需要一些调整。

small %>%
  mutate(count_yes = map_int(links, ~sum(.x %in% large$id[large$att %in% "yes"])))

#     id links     count_yes
#  <dbl> <list>        <int>
#1     1 <dbl [4]>         2
#2     2 <dbl [4]>         1

或者在基数R中:

sapply(small$links, \(x) sum(x %in% large$id[large$att %in% "yes"]))

注意使用 %in% 而不是 == 会回来的 FALSE 对于 NA 价值观。

SamR 6 天前

当你在寻找一个 tidyverse 解决方案,我认为这里的一种表达方式是 tidyr::unnest() 那么,这里的列表列 left_join() 到 large 和 summarise() :

small |>
    tidyr::unnest(links) |>
    left_join(large, by = c("links" = "id")) |>
    summarise(
        links = list(links),
        count_yes = sum(att == "yes", na.rm = TRUE), .by = id
    )

# # A tibble: 2 Ã 3
#      id links     count_yes
#   <dbl> <list>        <int>
# 1     1 <dbl [4]>         2
# 2     2 <dbl [4]>         1

虽然我宁愿只保留长格式的数据,而不是做最后一步,除非有很好的理由使用列表列,因为这将避免使用 map*() 或 *apply() 功能。

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前