代码之家 › 专栏 › 技术社区 › Abdel

连接具有相同后缀的变量对

for-loop r

Abdel · 技术社区 · 6 年前

我有一个数据帧,其中包含许多变量,我想将这些变量连接到同一个数据帧中的新变量中。我的数据框架的简化版本东风如下所示:

first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113

以下是我如何在没有for循环的情况下效率低下地执行此操作:

df$concatenated.1 <- paste0(df$first.1,"-",df$second.1)
df$concatenated.2 <- paste0(df$first.2,"-",df$second.2)

从而产生以下数据帧东风 :

first.1 second.1 first.2 second.2 concatenated.1 concatenated.2 
1222 3223 3333 1221 1222-3223 3333-1221 
1111 2212 2232 2113 1111-2212 2232-2113

我有很多2对以上的变量要连接,所以我想在for循环中这样做:

for (i in 1:2){
??
}

关于如何实现这一点有什么想法吗?

5 回复 | 直到 6 年前

IceCreamToucan 6 年前

如果您的真实数据具有与本示例数据相同的清晰模式的名称,则Ronak's split / lapply 答案可能是最好的。如果没有,您可以创建名称的向量并使用 Map 具有 paste .

new.names <- paste0('concatenated.', 1:2)
names.1 <- paste0('first.', 1:2)
names.2 <- paste0('second.', 1:2)

df[new.names] <- Map(paste, df[names.1], df[names.2], sep = '-')

df

#   first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
# 1    1222     3223    3333     1221      1222-3223      3333-1221
# 2    1111     2212    2232     2113      1111-2212      2232-2113

Ronak Shah 6 年前

如果你能找到一种方法来分割你的列,那么这会容易得多。例如,根据提供的示例,我们可以根据列名的最后一个字符(1、1、2、2)拆分列。

使用我们使用的R基 split.default 要根据名称(如上所述)拆分列,并针对每个组 paste 每行并添加新列。

group_names <- substring(names(df), nchar(names(df)))
df[paste0("concatenated.", unique(group_names))] <- 
     lapply(split.default(df,group_names),  function(x)  do.call(paste, c(x, sep = "-")))

df
#  first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
#1    1222     3223    3333     1221      1222-3223      3333-1221
#2    1111     2212    2232     2113      1111-2212      2232-2113

jdobres 6 年前

这里有一个 时髦诗 解决方案,让你走得更远。唯一的区别是,列是按字母顺序输出的,即“第一个”,然后是“连接的”,然后是“秒”。

txt <- 'first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113'

df <- read.table(text = txt, header = T)

library(tidyverse)

df2 <- df %>% 
  mutate(row.num = row_number()) %>% 
  gather(variable, value, -row.num) %>% 
  separate(variable, into = c('order', 'pair')) %>% 
  spread(order, value) %>% 
  mutate(concatenated = paste0(first, '-', second)) %>% 
  gather(variable, value, -row.num, -pair) %>% 
  unite(name, variable, pair) %>% 
  spread(name, value)

  row.num concatenated_1 concatenated_2 first_1 first_2 second_1 second_2
1       1      1222-3223      3333-1221    1222    3333     3223     1221
2       2      1111-2212      2232-2113    1111    2232     2212     2113

Marian Minar rcs 6 年前

library(tidyverse)

[编辑:未正确使用原始解决方案 starts_with ]

此解决方案使用 ends_with() 选择适当的列,然后 unite 将它们与 - 分离器:

df <- tribble(
        ~first.1, ~second.1, ~first.2, ~second.2,
        1222,3223,3333,1221,
        1111,2212,2232,2113)

df1 <- df %>%
  select(ends_with("1")) %>%
  unite(concatenated.1, sep = "-")

df2 <- df %>%
  select(ends_with("2")) %>%
  unite(concatenated.2, sep = "-")

cbind(df, df1, df2)

B. Christian Kamgang 6 年前

你可以使用这个功能 stri_join 在Stringi包中,速度非常快。

library(data.table)
library(stringi)

df <- fread("first.1 second.1 first.2 second.2 
             1222 3223 3333 1221 
             1111 2212 2232 2113")

cols <- paste0("concatenated_", 1:2)
df[, (cols) := Map(stri_join, .(first.1, first.2), .(second.1, second.2), sep = "-")]
setDF(df)

first.1 second.1 first.2 second.2 concatenated_1 concatenated_2
1    1222     3223    3333     1221      1222-3223      3333-1221
2    1111     2212    2232     2113      1111-2212      2232-2113