代码之家  ›  专栏  ›  技术社区  ›  Lmm

使用dplyr从特定小时中选择数据并使用该数据计算

  •  1
  • Lmm  · 技术社区  · 6 年前

    我在加拿大有一个时间段/山地时间。数据是每小时。每天我想取0:29:05(fn)的f值(fn=当地时间午夜后的f值),每天我想计算z=(fn-f)/fn

    但是,需要在Cadadian/Mountain中选择0:29:05的fn,每天每小时z需要使用当天的fn计算。

    虚拟数据:

     datetime <- seq(
     from=as.POSIXct("2012-1-1 0:29:05", tz="Canada/Mountain"),
     to=as.POSIXct("2012-2-1 0:29:05", tz="Canada/Mountain"),
     by="hour")
    
     #variable F
     F <- runif(745, min = 0, max =2)
    
    df <- as.data.frame(cbind(datetime,F))
    library(lubridate)
    #make sure its in  "POSIXct" "POSIXt" format
    df$datetime <- as_datetime(df$datetime)
    

    现在,在使用分钟数据集时,我已经在dplyr中获得了一些帮助,但显然我的理解仍然很差,因为我无法将其转换为小时1数据示例。下面是我的尝试…我想在这种情况下变异是正确的选择?

    df2 <- df %>%
    group_by(Date = as.Date(datetime)) %>%
    mutate(Fn = F[hour(datetime) == 0]), 
    z = (Fn - F)/Fn) %>%
    ungroup() %>%
    select(-Date)
    

    谢谢。

    1 回复  |  直到 6 年前
        1
  •  1
  •   parkerchad81    6 年前
      library(lubridate)
      library(tidyverse)
    
    datetime <- seq(
       from = as.POSIXct("2012-1-1 0:29:05", tz = "Canada/Mountain"),
       to = as.POSIXct("2012-2-1 0:29:05", tz = "Canada/Mountain"),
       by = "hour"
       )
    
    f <- runif(745, min = 0, max =2) #variable f 
    df <- data.frame(datetime, f)
    
    # method using fill function from tidyr package
    df2 <- df %>%
       mutate(Date = as.Date(datetime, tz = "Canada/Mountain")) %>% 
       left_join( #this will grab the f value at 0:29:05 of each day
         df %>% filter(hour(datetime) == 0) %>% select(datetime, Fn = f),
         by = 'datetime'
       ) %>% 
       group_by(Date) %>% 
       fill(Fn, Fn, .direction = 'down') %>% #fills in NA values with f values of the following day
       mutate(
         Z = ( Fn - f ) / Fn
       ) %>% 
       ungroup() %>% 
       select(-Date)
    
    # method not using fill
    df3 <- df %>%
       mutate(Date = as.Date(datetime, tz = "Canada/Mountain")) %>% 
       left_join( #this will grab the f value at 0:29:05 of each day
         df %>% filter(hour(datetime) == 0) %>% select(datetime, Fn = f),
         by = 'datetime'
       ) %>% 
       group_by(Date) %>% 
       mutate(
         Fn = na.omit(Fn),
         Z = ( Fn - f ) / Fn
       ) %>% 
       ungroup() %>% 
       select(-Date)
    
    # both methods result in the same result, as shown below
    # A tibble: 745 x 4
       datetime                f    Fn       Z
       <dttm>              <dbl> <dbl>   <dbl>
     1 2012-01-01 00:29:05 0.590 0.590  0     
     2 2012-01-01 01:29:05 1.57  0.590 -1.66  
     3 2012-01-01 02:29:05 0.537 0.590  0.0900
     4 2012-01-01 03:29:05 0.691 0.590 -0.171 
     5 2012-01-01 04:29:05 0.719 0.590 -0.218 
     6 2012-01-01 05:29:05 0.811 0.590 -0.374 
     7 2012-01-01 06:29:05 0.248 0.590  0.581 
     8 2012-01-01 07:29:05 1.98  0.590 -2.35  
     9 2012-01-01 08:29:05 0.839 0.590 -0.422 
    10 2012-01-01 09:29:05 0.733 0.590 -0.242 
    # ... with 735 more rows
    

    只是一个想法,但你不应该说出R对象的名字 F 因为它通常是为 FALSE 价值观。