代码之家  ›  专栏  ›  技术社区  ›  Varun

r循环数据帧列中的唯一值,以根据条件创建另一个值。

  •  2
  • Varun  · 技术社区  · 6 年前

    我的数据集包括在多个财政年度(2013财年、2014财年和2015财年)以及不同地区调查中提出的问题的得分和总受访者。

    我的目标是通过 FY 列并标识每个区域的每个问题的提出时间。并将这些信息存储在新列中。

    这就是可复制样品的样子。-

    testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
                  Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
                  QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
                  Very.Satisfied=runif(16,min = 0, max=1),
                  Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
                  Satisfied=runif(16,min = 0, max=1),
                  Total.Satisfied=floor(runif(16,min=10,max=120)),
                  Dissatisfied=runif(16,min = 0, max=1),
                  Total.Dissatisfied=floor(runif(16,min=10,max=120)),
                  Very.Dissatisfied=runif(16,min = 0, max=1),
                  Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))
    

    我从创建一个ID列开始,通过连接 Region 和; QST

    library(tidyr)
    testdf = testdf %>%
    unite(ID,c('Region','QST'),sep = "",remove = F)
    

    我的目标

    1)每种独特的 ID ,确定是否询问了给定的问题-

    a)仅一年(2013财年、2014财年或2015财年)

    b)过去两年(仅2015财年和2014财年)

    c)过去三年(2015财年、2014财年和2013财年)

    d)仅2013和2015财年

    我的尝试

    对于这个问题,我试图创建一个 for loop 以及每种独特的 身份证件 ,我首先存储每个Fy的唯一出现,问题是以向量形式提出的。 v . 然后使用一个if条件语句,我将一个注释分配给一个新创建的列,该列名为 Tally 基于这些情况。

    for (i in unique(testdf$ID))
    {
    v=unique(testdf$FY)
    
      if(('FY15' %in% v) & ('FY14' %in% v)) {
          testdf$Tally=='Asked Over The Past Two Years'
      } 
      else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
           testdf$Tally=='Asked Over The Past Three Years'
      }
      else if(('FY13' %in% v) & ('FY15' %in% v)) {
            testdf$Tally=='Question Asked in FY13 & FY15 Only'
      }
      else { testdf$Tally=='Question Asked Once Only' 
      }
    
    }  
    

    循环运行时似乎没有引发错误消息,但似乎没有创建新的 计数 列。

    如有任何帮助,我们将不胜感激。

    4 回复  |  直到 6 年前
        1
  •  0
  •   Parfait    6 年前

    考虑 ave 分组计算依据 区域 QST 嵌套内 ifelse 对于条件逻辑:

    testdf <- within(testdf, {
                       FY13 <- ifelse(FY=='FY13', 1, 0)
                       FY14 <- ifelse(FY=='FY14', 1, 0)
                       FY15 <- ifelse(FY=='FY15', 1, 0)
    
                       Tally <- ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 1,
                                       'Asked Only on One Year',
                                       ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 3,
                                              'Asked Over the Past Three Years',
                                              ifelse(ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                     'Asked Over the Past Two Years',
                                                     ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                            'Asked On FY13 & FY15 Only',
                                                            NA
                                                            )
                                                     )
                                              )
                                       )
    
                       FY13 <- NULL; FY14 <- NULL; FY15 <- NULL
                 })
    
    testdf[c("ID", "FY", "Tally")]
    
    #     Region QST   FY                           Tally
    # 1   AFRICA  Q2 FY13 Asked Over the Past Three Years
    # 2   AFRICA  Q2 FY14 Asked Over the Past Three Years
    # 3   AFRICA  Q2 FY15 Asked Over the Past Three Years
    # 4   AFRICA  Q5 FY14   Asked Over the Past Two Years
    # 5   AFRICA  Q5 FY15   Asked Over the Past Two Years
    # 6     ASIA  Q2 FY13 Asked Over the Past Three Years
    # 7     ASIA  Q2 FY14 Asked Over the Past Three Years
    # 8     ASIA  Q2 FY15 Asked Over the Past Three Years
    # 9     ASIA  Q5 FY13       Asked On FY13 & FY15 Only
    # 10    ASIA  Q5 FY15       Asked On FY13 & FY15 Only
    # 11 AMERICA  Q2 FY13 Asked Over the Past Three Years
    # 12 AMERICA  Q2 FY14 Asked Over the Past Three Years
    # 13 AMERICA  Q2 FY15 Asked Over the Past Three Years
    # 14 AMERICA  Q5 FY13 Asked Over the Past Three Years
    # 15 AMERICA  Q5 FY14 Asked Over the Past Three Years
    # 16 AMERICA  Q5 FY15 Asked Over the Past Three Years
    
        2
  •  2
  •   FlorianBrezina    6 年前

    在代码中,主要的问题是,在if else子句中,您没有执行赋值(使用“<-”),而是使用“==”进行比较。这里有一个解决方案,我觉得它更优雅,因为它不使用循环:

    require(tidyverse)
    
    testdf %>%
      select(ID, FY) %>%
      unique() %>%
      mutate(is_true = 1) %>%
      spread(key = FY, value = is_true, fill = 0) %>%
      mutate(tally = case_when(
        FY13 == 1 & FY14 == 1 & FY15 == 1 ~ 'Asked Over The Past Three Years',
                    FY14 == 1 & FY15 == 1 ~ 'Asked Over the Past Two Years',
        FY13 == 1 &             FY15 == 1 ~ 'Asked in FY12 & FY15 Only',
        TRUE ~ 'Question Asked Once Only'
      ))
    

    输出:

    +------------------------------------------------------------+
    |          ID FY13 FY14 FY15                           tally |
    +------------------------------------------------------------+
    | 1  AFRICAQ2    1    1    1 Asked Over The Past Three Years |
    | 2  AFRICAQ5    0    1    1   Asked Over the Past Two Years |
    | 3 AMERICAQ2    1    1    1 Asked Over The Past Three Years |
    | 4 AMERICAQ5    1    1    1 Asked Over The Past Three Years |
    | 5    ASIAQ2    1    1    1 Asked Over The Past Three Years |
    | 6    ASIAQ5    1    0    1       Asked in FY12 & FY15 Only |
    +------------------------------------------------------------+
    
        3
  •  1
  •   Roman    6 年前

    不需要循环:

    library(tidyverse)
    
    result <- testdf %>%
        select(3, 2, 1) %>%
        mutate(Asked = 1) %>%
        spread(FY, Asked)
    
    > result
      QST  Region FY13 FY14 FY15
    1  Q2  AFRICA    1    1    1
    2  Q2 AMERICA    1    1    1
    3  Q2    ASIA    1    1    1
    4  Q5  AFRICA   NA    1    1
    5  Q5 AMERICA    1    1    1
    6  Q5    ASIA    1   NA    1
    

    一次性回答所有四个问题。

    如果你 真的? 要创建计数列,请按如下方式展开:

    result %>%
        mutate(Tally = case_when(FY13 + FY14 + FY15 == 1 ~ "Only one year",
                                 FY13 + FY14 + FY15 == 3 ~ "Past three years",
                                 FY14 + FY15 == 2 ~ "Past two years",
                                 FY13 + FY15 == 2 ~ "FY13 and FY15 only",
                                 NA ~ NA_character_))
    
      QST  Region FY13 FY14 FY15              Tally
    1  Q2  AFRICA    1    1    1   Past three years
    2  Q2 AMERICA    1    1    1   Past three years
    3  Q2    ASIA    1    1    1   Past three years
    4  Q5  AFRICA   NA    1    1     Past two years
    5  Q5 AMERICA    1    1    1   Past three years
    6  Q5    ASIA    1   NA    1 FY13 and FY15 only
    
        4
  •  0
  •   jay.sf    6 年前

    有一个使用您的ID列的解决方案。(使用) paste0 不过,我们可以做得更好 testdf$ID <- paste0(testdf$Region, "_", testdf$QST) )

    我们 dcast 你的 testdf 使用 reshape2 包裹。

    library(reshape2)
    tmp <- dcast(testdf, ID ~ FY, 
                   value.var="QST", fun.aggregate=length)
    

    现在我们已经知道这个问题是否在不同的年份被问到了。为了回答进一步的问题,我们将做一些数学。

    tmp <- cbind(tmp, 
                 past2=as.numeric(t2[3] + t2[4] == 2 & t2[2] == 0), 
                 past3=as.numeric(t2[2] + t2[3] + t2[4] == 3),
                 y13_15=as.numeric(t2[2] + t2[4] == 2 & t2[3] == 0))
    

    5:7列中的序列包含所需的 Tally 我们可以喝牛奶的信息

    tmp$Tally <- apply(tmp, 1, function(x) paste0(x[5:7], collapse=""))
    

    按要素水平翻译成人类语言,

    tmp$Tally <- factor(tmp$Tally, labels=c('Question Asked Once Only',
                                            'Question Asked in FY13 & FY15 Only',
                                            'Asked Over The Past Three Years',
                                            'Asked Over The Past Two Years'))
    

    并与原始数据帧进行合并,得到所需的结果。

    结果

    > merge(testdf, t3[c(1, 8)])
                 ID   FY    Region QST                              Tally
    1     AFRICA_Q2 FY13    AFRICA  Q2    Asked Over The Past Three Years
    2     AFRICA_Q2 FY14    AFRICA  Q2    Asked Over The Past Three Years
    3     AFRICA_Q2 FY15    AFRICA  Q2    Asked Over The Past Three Years
    4     AFRICA_Q5 FY14    AFRICA  Q5      Asked Over The Past Two Years
    5     AFRICA_Q5 FY15    AFRICA  Q5      Asked Over The Past Two Years
    6    AMERICA_Q2 FY13   AMERICA  Q2    Asked Over The Past Three Years
    7    AMERICA_Q2 FY14   AMERICA  Q2    Asked Over The Past Three Years
    8    AMERICA_Q2 FY15   AMERICA  Q2    Asked Over The Past Three Years
    9    AMERICA_Q5 FY13   AMERICA  Q5    Asked Over The Past Three Years
    10   AMERICA_Q5 FY14   AMERICA  Q5    Asked Over The Past Three Years
    11   AMERICA_Q5 FY15   AMERICA  Q5    Asked Over The Past Three Years
    12 ANTH.CTRY_Q2 FY15 ANTH.CTRY  Q2           Question Asked Once Only
    13      ASIA_Q2 FY13      ASIA  Q2    Asked Over The Past Three Years
    14      ASIA_Q2 FY14      ASIA  Q2    Asked Over The Past Three Years
    15      ASIA_Q2 FY15      ASIA  Q2    Asked Over The Past Three Years
    16      ASIA_Q5 FY13      ASIA  Q5 Question Asked in FY13 & FY15 Only
    17      ASIA_Q5 FY15      ASIA  Q5 Question Asked in FY13 & FY15 Only
    

    数据

    testdf <- structure(list(FY = c("FY13", "FY14", "FY15", "FY14", "FY15", 
    "FY13", "FY14", "FY15", "FY13", "FY15", "FY13", "FY14", "FY15", 
    "FY13", "FY14", "FY15", "FY15"), Region = c("AFRICA", "AFRICA", 
    "AFRICA", "AFRICA", "AFRICA", "ASIA", "ASIA", "ASIA", "ASIA", 
    "ASIA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", 
    "AMERICA", "ANTH.CTRY"), QST = c("Q2", "Q2", "Q2", "Q5", "Q5", 
    "Q2", "Q2", "Q2", "Q5", "Q5", "Q2", "Q2", "Q2", "Q5", "Q5", "Q5", 
    "Q2")), row.names = c(NA, 17L), class = "data.frame")