代码之家  ›  专栏  ›  技术社区  ›  Omar Gonzales

分组后所有RAW的值是否相同?

r
  •  0
  • Omar Gonzales  · 技术社区  · 2 年前

    在做了一些转换之后,我得到了重复的数据。

    enter image description here

    分组:

    
    paid <- df%>%
                     group_by(date) %>%
                     summarise(sessions = sum(sessions),
                               revenue_usd = sum(transaction_revenue)/3.8,
                               bounceRate = bounces/sessions,
                               transactions = sum(transactions))
    

    df <- structure(list(date = structure(c(19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 19186, 
    19186, 19186, 19186, 19186, 19186, 19186, 19186), class = "Date"), 
        source = c("(direct)", "(not set)", "127.0.0.1:20489", "13d612549ae98efa4eebca49fbda5aa9.safeframe.googlesyndication.com", 
        "214791a1533c2a50e91a0172b59da402.safeframe.googlesyndication.com", 
        "408b8fdfaa1d06bb140607437bd90f10.safeframe.googlesyndication.com", 
        "5e1452d421694493662f87729e4e130d.safeframe.googlesyndication.com", 
        "8042dc59e10779732adc12d6359c3ab9.safeframe.googlesyndication.com", 
        "84c4aa5f9449946d1cb5812ca3ea633a.safeframe.googlesyndication.com", 
        "89a3e733827acf8133fab396ec550646.safeframe.googlesyndication.com", 
        "9ad91f3bc7d10dce0ba6fb1c18d32c0e.safeframe.googlesyndication.com", 
        "9d81febecab8a704ce297781effecc07.safeframe.googlesyndication.com", 
        "accounts.google.com", "actitudfem.com", "admin.pagoefectivo.pe", 
        "ads.us.criteo.com", "adsintegrity.net", "anmosugoi.com", 
        "beneficios.bbva.pe", "biggestchef.com", "bing", "content.jwplatform.com", 
        "criteo", "criteo", "Criteo", "Criteo,criteo", "cse.google.com", 
        "cuotealo.viabcp.com", "cyberwow.pe", "diariocorreo.pe", 
        "dinersclub.pe", "docs.google.com", "duckduckgo", "ecosia.org", 
        "edmmkt", "elcomercio", "elcomercio", "estudiantes.samsung.com.pe", 
        "facebook", "facebook", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network", "facebook-network", 
        "facebook-network", "facebook-network"), medium = c("(none)", 
        "(not set)", "referral", "referral", "referral", "referral", 
        "referral", "referral", "referral", "referral", "referral", 
        "referral", "referral", "referral", "referral", "referral", 
        "referral", "referral", "referral", "referral", "organic", 
        "referral", "consideration", "retargeting", "(not set)", 
        "consideration", "referral", "referral", "referral", "referral", 
        "referral", "referral", "organic", "organic", "image", "cpm", 
        "cpm", "referral", "cpm", "social", "(not set)", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", "cpa", 
        "cpa", "cpa", "cpa", "cpa"), sessions = c(1746, 4, 1, 1, 
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 40, 1, 1, 13, 1, 40, 1, 
        3636, 2921, 4, 1, 4, 54, 20, 1, 66, 1, 1, 3, 1, 1, 29, 9, 
        1, 86, 75, 1, 1, 3, 1, 1, 7, 1, 3, 1, 1, 3, 1, 2, 3, 1, 1, 
        2, 1, 4, 6, 1, 2, 6, 12, 5, 73, 2, 7, 1, 131, 22, 5, 14, 
        69, 8, 1, 4, 72, 2, 3, 1, 6, 21, 20, 72, 17, 7, 1, 15, 218, 
        148, 278, 159, 30, 2, 373, 1, 1, 3, 5, 1, 3, 1, 1, 1, 1, 
        1, 5, 1, 1, 2, 2, 2, 2, 1, 2, 12, 1, 1, 2, 1, 1, 1, 2, 1, 
        144, 14, 1, 4, 54, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 5, 1, 
        1, 5, 1, 10, 156, 2, 9, 2, 19, 34, 179, 1, 2, 4, 447, 1, 
        4, 2, 321, 1, 1, 5, 163, 3, 4, 318, 6, 2, 9, 162, 2, 59, 
        1, 5, 414, 1, 4, 3, 312, 7, 701, 2, 2, 2, 147, 1, 4, 30, 
        137, 28, 1, 14, 1, 1, 1, 451), transactions = c(17, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 
        0, 2, 0, 0, 0, 29, 0, 0, 3, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 3), transaction_revenue = c(44183, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1115.52, 0, 0, 1375.67, 
        0, 0, 0, 0, 4198, 0, 0, 0, 59269, 0, 0, 6147, 0, 0, 0, 0, 
        0, 0, 1999, 0, 599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 4199, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2599, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7397), bounces = c(1278, 
        4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 1, 1, 6, 1, 
        23, 1, 3294, 2333, 2, 1, 3, 4, 12, 0, 32, 1, 1, 3, 1, 1, 
        29, 7, 1, 66, 49, 0, 1, 3, 1, 1, 5, 1, 0, 0, 1, 2, 1, 2, 
        3, 0, 1, 2, 1, 4, 6, 0, 1, 4, 9, 5, 61, 2, 6, 1, 115, 20, 
        4, 12, 63, 7, 1, 4, 62, 2, 2, 1, 5, 17, 16, 59, 17, 6, 1, 
        10, 194, 132, 235, 136, 27, 2, 274, 1, 1, 3, 4, 1, 3, 1, 
        1, 0, 1, 1, 3, 1, 1, 1, 2, 1, 2, 0, 2, 7, 0, 1, 1, 1, 1, 
        1, 1, 1, 129, 12, 1, 4, 49, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 
        0, 5, 1, 1, 2, 1, 3, 133, 2, 5, 2, 16, 27, 150, 0, 1, 4, 
        377, 1, 4, 1, 254, 1, 1, 3, 120, 3, 3, 248, 6, 1, 9, 123, 
        2, 47, 1, 4, 340, 1, 4, 3, 244, 7, 557, 2, 2, 2, 117, 1, 
        4, 23, 100, 22, 1, 12, 0, 1, 0, 325)), totals = list(list(
        sessions = "89600", transactions = "247", transactionRevenue = "547126.09", 
        bounces = "73354")), minimums = list(list(sessions = "1", 
        transactions = "0", transactionRevenue = "0.0", bounces = "0")), maximums = list(
        list(sessions = "8355", transactions = "65", transactionRevenue = "145010.41", 
            bounces = "7614")), isDataGolden = TRUE, rowCount = 737L, row.names = c(NA, 
    200L), class = "data.frame")
    

    会话信息:

    > sessionInfo()
    R version 4.2.1 (2022-06-23 ucrt)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows 10 x64 (build 22000)
    
    Matrix products: default
    
    locale:
    [1] LC_COLLATE=Spanish_Peru.utf8  LC_CTYPE=Spanish_Peru.utf8    LC_MONETARY=Spanish_Peru.utf8
    [4] LC_NUMERIC=C                  LC_TIME=Spanish_Peru.utf8    
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
     [1] openxlsx_4.2.5         janitor_2.1.0          forcats_0.5.1          stringr_1.4.0          dplyr_1.0.9           
     [6] purrr_0.3.4            readr_2.1.2            tidyr_1.2.0            tibble_3.1.7           ggplot2_3.3.6         
    [11] tidyverse_1.3.1        googleAnalyticsR_1.0.1
    
    loaded via a namespace (and not attached):
     [1] Rcpp_1.0.8.3              lubridate_1.8.0           assertthat_0.2.1          digest_0.6.29            
     [5] utf8_1.2.2                R6_2.5.1                  cellranger_1.1.0          backports_1.4.1          
     [9] reprex_2.0.1              httr_1.4.3                pillar_1.7.0              rlang_1.0.3              
    [13] curl_4.3.2                readxl_1.4.0              rstudioapi_0.13           whisker_0.4              
    [17] googleAuthR_2.0.0         munsell_0.5.0             broom_1.0.0               compiler_4.2.1           
    [21] modelr_0.1.8              askpass_1.1               pkgconfig_2.0.3           openssl_2.0.2            
    [25] tidyselect_1.1.2          fansi_1.0.3               crayon_1.5.1              tzdb_0.3.0               
    [29] dbplyr_2.2.1              withr_2.5.0               rappdirs_0.3.3            grid_4.2.1               
    [33] jsonlite_1.8.0            gtable_0.3.0              lifecycle_1.0.1           DBI_1.1.3                
    [37] magrittr_2.0.3            scales_1.2.0              zip_2.2.0                 cli_3.3.0                
    [41] stringi_1.7.6             cachem_1.0.6              fs_1.5.2                  snakecase_0.11.0         
    [45] xml2_1.3.3                ellipsis_0.3.2            generics_0.1.3            vctrs_0.4.1              
    [49] tools_4.2.1               measurementProtocol_0.1.0 glue_1.6.2                hms_1.1.1                
    [53] fastmap_1.1.0             colorspace_2.0-3          gargle_1.2.0              rvest_1.0.2              
    [57] memoise_2.0.1             haven_2.5.0               usethis_2.1.6      
    
    2 回复  |  直到 2 年前
        1
  •  1
  •   duncanw    2 年前

    正如里奇的评论所表明的那样,这是因为 bounces/sessions 返回向量。

    bounceRate 总结之后,我想你会得到你想要的。

    paid <- df%>%
      group_by(date) %>%
      summarise(sessions = sum(sessions),
                revenue_usd = sum(transaction_revenue)/3.8,
                bounces = sum(bounces),
                transactions = sum(transactions)) %>% 
      mutate(bounceRate = bounces/sessions) 
    
        2
  •  1
  •   Soccerama    2 年前

    与duncanw几乎相同,但有一个更简单的解决方案,它提供了与原始列相同的输出列:

        paid <- df%>%
          group_by(transaction_date) %>% 
          summarise(sessions=sum(sessions),
              revenue_usd= sum(transaction_revenue)/3.8,
              bounceRate=sum(bounces)/sum(sessions),
              transactions = sum(transactions))
    
        3
  •  0
  •   Anup Tirpude    2 年前

    我从我这边测试了一下,结果和你的一样。我很惊讶,并开始研究它。这就是我发现的,

    没有任何聚合功能,它只是一个简单的划分,应该在行级别发生/执行,并且由于没有提供用于汇总的函数,因此结果保留在行级别。

    paid <- df%>%
    group_by(date) %>%
    summarise(sessions = sum(sessions),
            revenue_usd = sum(transaction_revenue)/3.8,
            bounceRate = bounces/sessions,
            transactions = sum(transactions))
    

    此代码的结果是所有行保持不变,如下所示:

    paid <- df%>%
    group_by(date) %>%
    summarise(sessions = sum(sessions),
            revenue_usd = sum(transaction_revenue)/3.8,
            # bounceRate = bounces/sessions,
            transactions = sum(transactions))
    

    所以在我看来,你可以在这个比率上使用任何一个和,如果你觉得它不正确,你可以先总结,然后取这个比率。我的建议是在总结后使用第二种方法。我把这两个的代码都放在下面,

       paid <- df%>%
       group_by(date) %>%
       summarise(sessions = sum(sessions),
            revenue_usd = sum(transaction_revenue)/3.8,
            bounceRate = sum(bounces/sessions),
            transactions = sum(transactions))
    

    这会给你结果

    enter image description here

    方法02:

    paid <- df%>%
    group_by(date) %>%
    summarise(sessions = sum(sessions),
            revenue_usd = sum(transaction_revenue)/3.8,
            transactions = sum(transactions),
            total_bounces = sum(bounces),
            total_sessions = sum(sessions)) %>%
    mutate(bounceRate = total_bounces/total_sessions)
    

    这将导致以下结果,我认为这更合适。

    enter image description here

    如果这能解决你的问题。快乐的编码!