代码之家  ›  专栏  ›  技术社区  ›  hnandarusdy

聚合查询-每个月每人平均3个月

  •  1
  • hnandarusdy  · 技术社区  · 7 年前

    假设我在bigquery中有数据视图

    Person | Amount | yearMonth
    ---------------------------
    AA     |   100  |   201701
    AA     |   200  |   201702
    AA     |   300  |   201703
    AA     |   70   |   201704
    AB     |   10   |   201701
    AB     |   50   |   201702
    AB     |   60   |   201703
    AB     |   70   |   201704
    AC     |   70   |   201701
    AC     |   80   |   201702
    AC     |   30   |   201703
    AC     |   10   |   201704
    

    现在,我需要得到每个人每个月在过去3个月的平均值

    预期结果:

    Person | Amount | yearMonth
    ---------------------------
    AA     |   200  |   201703(avg of 201701-201703)
    AA     |   190  |   201704(avg of 201702-201704)
    AB     |   40   |   201703(avg of 201701-201703)
    AB     |   60   |   201704(avg of 201702-201704)
    AC     |   60   |   201703(avg of 201701-201703)
    AC     |   40   |   201704(avg of 201702-201704)
    

    这是如何计算的?

    第一排

    • AA=200,来自100(201701)+200(201702)+300(201703)/3=200
    • AA=100,来自200(201702)+300(201703)+70(201704)/3=190
    • AB=40,来自10(201701)+50(201702)+60(201703)/3=40
    • 很快

    我真的不知道该如何按此分组。我不介意你的回答是否与这个问题有关。

    谢谢你堆

    1 回复  |  直到 7 年前
        1
  •  2
  •   Mikhail Berlyant    7 年前

    下面是BigQuery标准SQL(至少应该让您了解正确分组的逻辑)

    #standardSQL
    SELECT
      person, yearMonth, CAST(amount AS INT64) amount
    FROM (
      SELECT
        person, yearMonth, dt,
        AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
        COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
      FROM (
        SELECT 
          person, amount, yearMonth, 
          UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
        FROM `project.dataset.table`
      )
    )
    WHERE months = 3
    -- ORDER BY person, yearMonth
    

    您可以使用以下虚拟数据测试/玩它

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT 'AA' person, 100 amount, 201701 yearMonth UNION ALL
      SELECT 'AA', 200, 201702 UNION ALL
      SELECT 'AA', 300, 201703 UNION ALL
      SELECT 'AA', 70, 201704 UNION ALL
      SELECT 'AB', 10, 201701 UNION ALL
      SELECT 'AB', 50, 201702 UNION ALL
      SELECT 'AB', 60, 201703 UNION ALL
      SELECT 'AB', 70, 201704 UNION ALL
      SELECT 'AC', 70, 201701 UNION ALL
      SELECT 'AC', 80, 201702 UNION ALL
      SELECT 'AC', 30, 201703 UNION ALL
      SELECT 'AC', 10, 201704 
    )
    SELECT
      person, yearMonth, CAST(amount AS INT64) amount
    FROM (
      SELECT
        person, yearMonth, dt,
        AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
        COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
      FROM (
        SELECT 
          person, amount, yearMonth, 
          UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
        FROM `project.dataset.table`
      )
    )
    WHERE months = 3
    ORDER BY person, yearMonth
    

    输出如预期

    person  yearMonth   amount   
    AA      201703      200  
    AA      201704      190  
    AB      201703      40   
    AB      201704      60   
    AC      201703      60   
    AC      201704      40    
    

    添加了BigQuery旧版SQL的版本

    #legacySQL
    SELECT
      person, yearMonth, INTEGER(amount) amount
    FROM (
      SELECT
        person, yearMonth, dt,
        AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
        COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
      FROM (
        SELECT 
          person, amount, yearMonth, 
          TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
        FROM [project:dataset.table]
      )
    )
    WHERE months = 3
    -- ORDER BY person, yearMonth
    

    您可以使用下面的示例对虚拟数据进行测试/播放

    #legacySQL
    SELECT
      person, yearMonth, INTEGER(amount) amount
    FROM (
      SELECT
        person, yearMonth, dt,
        AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
        COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
      FROM (
        SELECT 
          person, amount, yearMonth, 
          TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
        FROM -- [project:dataset.table]
          (SELECT 'AA' person, 100 amount, 201701 yearMonth),
          (SELECT 'AA' person, 200 amount, 201702 yearMonth),
          (SELECT 'AA' person, 300 amount, 201703 yearMonth),
          (SELECT 'AA' person, 70 amount, 201704 yearMonth),
          (SELECT 'AB' person, 10 amount, 201701 yearMonth),
          (SELECT 'AB' person, 50 amount, 201702 yearMonth),
          (SELECT 'AB' person, 60 amount, 201703 yearMonth),
          (SELECT 'AB' person, 70 amount, 201704 yearMonth),
          (SELECT 'AC' person, 70 amount, 201701 yearMonth),
          (SELECT 'AC' person, 80 amount, 201702 yearMonth),
          (SELECT 'AC' person, 30 amount, 201703 yearMonth),
          (SELECT 'AC' person, 10 amount, 201704 yearMonth)
      )
    )
    WHERE months = 3
    ORDER BY person, yearMonth