代码之家  ›  专栏  ›  技术社区  ›  Alex T

Google BigQuery随时间窗口聚合OHLC数据

  •  0
  • Alex T  · 技术社区  · 7 年前

    google的BigQuery存储了时间序列交易历史记录。

    # Transaction history scheme
    
    exchange_id INTEGER REQUIRED  
    from_id INTEGER REQUIRED    
    to_id   INTEGER REQUIRED    
    price   FLOAT   REQUIRED    
    size    FLOAT   REQUIRED    
    ts  TIMESTAMP   REQUIRED    
    is_sell BOOLEAN NULLABLE    
    _PARTITIONTIME  TIMESTAMP   NULLABLE    
    

    exchange\u id -发生交易的平台 from\u id -基本符号
    to\U id -报价符号
    价格 -交易价格,交易价格
    大小 -交易数量

    我需要聚合 OHLC 超过30秒时间间隔的数据按分组
    exchange_id, from_id, to_id . 如何在BigQuery中执行此操作?

    # Required OHLC aggregated data scheme
    
    ts  TIMESTAMP   REQUIRED 
    exchange_id INTEGER REQUIRED  
    from_id INTEGER REQUIRED    
    to_id   INTEGER REQUIRED    
    open   FLOAT   REQUIRED    
    high   FLOAT   REQUIRED    
    low   FLOAT   REQUIRED    
    close   FLOAT   REQUIRED    
    volume    FLOAT   REQUIRED 
    _PARTITIONTIME  TIMESTAMP   NULLABLE       
    

    打开 -区间首价
    高的 -最高价格。。
    低的 -最低价格。。
    -最后价格。。
    体积 -当前时间间隔内所有交易规模的总和


    最有希望的想法是:

    SELECT 
        TIMESTAMP_SECONDS(
          UNIX_SECONDS(ts) -
          60 * 1000000
        ) AS time,
        exchange_id,
        from_id,
        to_id,
        MIN(price) as low,
        MAX(price) as high,
        SUM(size) as volume
    FROM 
        `table`
    GROUP BY
      time, exchange_id, from_id, to_id
    ORDER BY
      time
    

    还有这个:

    SELECT 
      exchange_id,from_id,to_id,
      MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
      MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
      SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
    FROM   [table];
    
    
    # returns:
    1   1   4445    3808    9.0E-8  9.0E-8  300000.0     
    2   1   4445    3808    9.0E-8  9.0E-8  300000.0     
    3   1   4445    3808    9.0E-8  9.0E-8  300000.0     
    ...
    14  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
    15  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
    16  1   4445    3808    9.0E-8  9.0E-8  865939.3721800799    
    

    但这些都不管用。似乎我遗漏了BigQuery中滑动窗口的一些重要内容。

    2 回复  |  直到 7 年前
        1
  •  3
  •   Mikhail Berlyant    7 年前

    以下是BigQuery标准SQL

    #standardsql
    SELECT 
      exchange_id, 
      from_id, 
      to_id,
      TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
      ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
      MAX(price) high,
      MIN(price) low,
      ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
      SUM(size) volume
    FROM `yourproject.yourdataset.yourtable`
    GROUP BY 1, 2, 3, 4
    
        2
  •  0
  •   Alex T    6 年前

    找到了一种在预定义 date_parts ( docs ). 当您需要在周一或几个月内进行聚合时,它非常有用。

    DATETIME\u TRUNC 支持下一个参数:

    MICROSECOND
    MILLISECOND
    SECOND
    MINUTE
    HOUR
    DAY
    WEEK
    WEEK(<WEEKDAY>)
    MONTH
    QUARTER
    YEAR
    

    您可以这样聚合使用它:

    #standardsql
    
    SELECT 
      TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
      ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
      MAX(high) high,
      MIN(low) low,
      ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
      SUM(volume) volume
    FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
    GROUP BY timestamp
    ORDER BY timestamp ASC