代码之家  ›  专栏  ›  技术社区  ›  CaptainProg

避免猜测reshope()[duplicate]中的“变化”

  •  1
  • CaptainProg  · 技术社区  · 8 年前

    我在努力 reshape() R中的一些时变数据。我正在处理以下数据集:

    dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")
    

    这些是来自纵向研究的时变数据,是我从源文件导入的更大数据集的子集。我想提取值 pop , coe rcb 对于 baseline final 研究访问(在我的完整数据集中,中间有几次访问,为了这个问题,我省略了这些访问)。

    我可以做到以下几点:

    reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')
    

    然而,这最终得到了应该在 流行音乐 被标记为 coe公司 .文件 reshape2 告诉我应该显式引用 varying 以避免“猜测”。因此,我尝试这样做:

    reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')
    

    这导致 完全相同的输出 ,尽管命名为 不同的 显式参数。我做错了什么?大概 流行音乐 最终 coe公司 的值,但我无法理解为什么会发生这种情况,因为我现在已经声明了 不同的 显式参数。。。

    编辑: 预期产出如下:

    participant_id  time    pop coe         rcb
    FDVCZX          1       6   11.19       16.74
    ADSCXZ          1       6   13.6        25
    AESFDZC         1       7   3.96        25
    ZXCV            1       6   7.64        18.37
    AGS             1       6   6.12        25
    AGSFV           1       6   6.92        25
    FDVCZX          2       NA  NA          NA
    ADSCXZ          2       NA  NA          NA
    AESFDZC         2       7.1 5.926362    25
    ZXCV            2       8   4.89        NA
    AGS             2       6   11.98       NA
    AGSFV           2       NA  NA          NA
    

    然而,正如您将看到的 流行音乐 值最终出现在 coe公司 列,反之亦然。

    1 回复  |  直到 8 年前
        1
  •  0
  •   akrun    8 年前

    我们可以使用 melt 从…起 data.table ,这可能需要多个 measure 柱。

    library(data.table)
    melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'), 
         value.name = c('pop', 'coe', 'rcb'), variable.name='time')
    #    participant_id time pop       coe   rcb
    # 1:         FDVCZX    1 6.0 11.190000 16.74
    # 2:         ADSCXZ    1 6.0 13.600000 25.00
    # 3:        AESFDZC    1 7.0  3.960000 25.00
    # 4:           ZXCV    1 6.0  7.640000 18.37
    # 5:            AGS    1 6.0  6.120000 25.00
    # 6:          AGSFV    1 6.0  6.920000 25.00
    # 7:         FDVCZX    2  NA        NA    NA
    # 8:         ADSCXZ    2  NA        NA    NA
    # 9:        AESFDZC    2 7.1  5.926362 25.00
    #10:           ZXCV    2 8.0  4.890000    NA
    #11:            AGS    2 6.0 11.980000    NA
    #12:          AGSFV    2  NA        NA    NA