我有一个问题,我有一个数据帧列表,其中数据帧的每列在第一行有一个名称,在列中的某些位置有x-s。如果有x,则第一行中的名称被视为选中。
在实际问题中,我阅读了一个包含许多工作表的xlsx文件,其中每个工作表都包含一个大矩阵:每列的第一行有一个名称,在一个稀疏的矩阵中有许多x-s。每个工作表都成为数据帧列表中的一个数据帧。行名称包含一个标识符,该标识符与查找相关,但与此处描述的我的问题无关。
data1 <- data.frame(Col1 = c("Mark", "x", "", "x", "", ""),
Col2 = c("Paul", "", "", "", "x", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "x", "x", "", ""),
Col5 = c("Peter", "x", "x", "x", "", ""),
stringsAsFactors = FALSE)
data2 <- data.frame(Col1 = c("Mark", "x", "x", "", "", ""),
Col2 = c("Paul", "", "", "", "", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "", "x", "", ""),
Col5 = c("Peter", "x", "x", "", "", ""),
stringsAsFactors = FALSE)
data <- list(data1 = data1, data2 = data2)
列表中的每个数据帧具有以下结构(为方便起见,显示为矩阵),其中列表中每个数据帧的名称相同。只有x-s不同:
> as.matrix(data1)
Col1 Col2 Col3 Col4 Col5
[1,] "Mark" "Paul" "Jane" "Mary" "Peter"
[2,] "x" "" "" "x" "x"
[3,] "" "" "" "x" "x"
[4,] "x" "" "" "x" "x"
[5,] "" "x" "" "" ""
[6,] "" "" "" "" ""
我想在列表中的每个数据框中添加一列(“批准人”),如果列中有一个“x”,则该列是第1行中名称的串联,如下所示:
Col1 Col2 Col3 Col4 Col5 Approvers
[1,] "Mark" "Paul" "Jane" "Mary" "Peter" ""
[2,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[3,] "" "" "" "x" "x" "Mary; Peter"
[4,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[5,] "" "x" "" "" "" "Paul"
[6,] "" "" "" "" "" ""
目前,我分两步解决这个问题:
-
-
在嵌套的for循环中,我查找第一行中的所有名称并将它们连接起来。
position <- lapply(data, function(x) apply(x, 1, function(y) which(y %in% "x")))
position <- lapply(position, function(x) lapply(x, function(y) {if (length(y) == 0L) return(0) else return(y)})) # remove int(0) and replace with 0
position <- lapply(position, function(x) lapply(x, function(x) paste(x, collapse = ","))) # flatten second level list into string
for (i in 1:length(data)) {
for (j in 1:nrow(data[[i]])) {
if (as.numeric(unlist(strsplit(position[[i]][[j]], ",")))[[1]] == 0) {
data[[i]][j, "Approvers"] <- ""
} else {
data[[i]][j, "Approvers"] <- paste(data[[i]][1, as.numeric(unlist(strsplit(position[[i]][[j]], ",")))], collapse = "; ")
}
}
}
对我来说,这很笨拙,我想使用lappy和mapply同时遍历两个列表来实现这一点,但我不知道如何做到这一点。此外,创建position对象并将x-s的列索引折叠成字符串,然后在循环中将其分离,这过于复杂。