代码之家 › 专栏 › 技术社区 › llewmills

基于通用命名特征将函数应用于列子集的自适应函数

dplyr r

llewmills · 技术社区 · 4 月前

我要么咖啡因摄入过多,要么咖啡因摄入不足,因为我不知道该怎么做。我需要创建一个函数来计算一个方程,该方程对多组变量的截距和效果进行幂运算,每组变量由列名中的一个公共字符串分组,然后对所有指数求和,得到一个值。我需要跨每行中的列执行此操作,因此 dplyr 似乎是显而易见的选择。棘手的部分是,函数需要能够对每个集合中不同数量的元素执行此操作。展示比描述更容易。

以下是两个数据集

set.seed(1)

names_df1 <- c("ball", "bell", "bat")
df1 <- data.frame(int_ball = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_ball = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bell = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bell = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bat = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bat = sample(seq(-.99,-.01, .01),5,replace=T))


names_df2 <- c("dog", "cat", "bird", "fish")
df2 <- data.frame(int_dog = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_dog = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_cat = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_cat = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_bird = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_bird = sample(seq(-.99,-.01, .01),5,replace=T),
                  int_fish = sample(seq(-.99,-.01, .01),5,replace=T),
                  eff_fish = sample(seq(-.99,-.01, .01),5,replace=T))

每个数据集具有与每个数据集前面的字符串向量中的元素一样多的变量对( names_df1 和 names_df2 ). 我需要把 int_ 和 eff_ 每个对的变量,然后对结果进行幂运算, 然后把所有这些指数加在一起。对于数据集,我们有三组配对,结果如下所示

df1 %>%
  mutate(eq_df1 = exp(int_ball + eff_ball) + exp(int_bell + eff_bell) + exp(int_bat + eff_bat))

#   int_ball eff_ball int_bell eff_bell int_bat eff_bat   eq_df1
# 1    -0.32    -0.57    -0.03    -0.93   -0.11   -0.21 1.519698
# 2    -0.61    -0.86    -0.15    -0.27   -0.63   -0.67 1.159504
# 3    -0.99    -0.18    -0.79    -0.21   -0.66   -0.16 1.118678
# 4    -0.66    -0.41    -0.46    -0.15   -0.11   -0.65 1.354026
# 5    -0.13    -0.49    -0.26    -0.63   -0.56   -0.30 1.371762

对于有四组配对的数据集,它看起来是这样的

df2 %>%
  mutate(eq_df2 = exp(int_dog + eff_dog) + exp(int_cat + eff_cat) + exp(int_bird + eff_bird) + exp(int_fish + eff_fish))

#   int_dog eff_dog int_cat eff_cat int_bird eff_bird int_fish eff_fish   eq_df2
# 1   -0.26   -0.80   -0.56   -0.58    -0.98    -0.35    -0.19    -0.11 1.671570
# 2   -0.58   -0.56   -0.75   -0.94    -0.55    -0.30    -0.87    -0.77 1.125734
# 3   -0.62   -0.13   -0.30   -0.76    -0.82    -0.13    -0.60    -0.16 1.673230
# 4   -0.80   -0.30   -0.61   -0.68    -0.78    -0.30    -0.11    -0.71 1.388169
# 5   -0.72   -0.60   -0.49   -0.86    -0.22    -0.25    -0.52    -0.87 1.400453

非常感谢您的帮助。解决方案不必在dplyr中。

1 回复 | 直到 4 月前

lotus 4 月前

您可以定义函数,将列转换为长格式,进行所需的计算,并绑定回原始数据:

library(dplyr)
library(tidyr)
library(tibble)

f <- function(.data, vars = starts_with(c("eff_", "int_"))) {
  .data |> 
    select( {{ vars }} ) |> 
    rowid_to_column() |>
    pivot_longer(-rowid, names_sep = "_", names_to = c(".value", "name")) |> 
    summarise(eq = sum(exp(pick(2) + pick(3))), .by = rowid) |> 
    select(-rowid) |> 
    bind_cols(.data, results = _)
}

f(df1)
  int_ball eff_ball int_bell eff_bell int_bat eff_bat       eq
1    -0.32    -0.57    -0.03    -0.93   -0.11   -0.21 1.519698
2    -0.61    -0.86    -0.15    -0.27   -0.63   -0.67 1.159504
3    -0.99    -0.18    -0.79    -0.21   -0.66   -0.16 1.118678
4    -0.66    -0.41    -0.46    -0.15   -0.11   -0.65 1.354026
5    -0.13    -0.49    -0.26    -0.63   -0.56   -0.30 1.371762

f(df2)
  int_dog eff_dog int_cat eff_cat int_bird eff_bird int_fish eff_fish       eq
1   -0.26   -0.80   -0.56   -0.58    -0.98    -0.35    -0.19    -0.11 1.671570
2   -0.58   -0.56   -0.75   -0.94    -0.55    -0.30    -0.87    -0.77 1.125734
3   -0.62   -0.13   -0.30   -0.76    -0.82    -0.13    -0.60    -0.16 1.673230
4   -0.80   -0.30   -0.61   -0.68    -0.78    -0.30    -0.11    -0.71 1.388169
5   -0.72   -0.60   -0.49   -0.86    -0.22    -0.25    -0.52    -0.87 1.400453