代码之家  ›  专栏  ›  技术社区  ›  the_darkside

SQL中分组时间序列的最小值和最大值

  •  2
  • the_darkside  · 技术社区  · 6 年前

    test 我想从中提取 序列 no_signal mobile_id

    在真正的表中,记录没有排序,我认为这意味着 PARTITION OVER (time, mobile_id )除了窗口函数外,还必须包含语句。任何关于如何为单个连续序列创建一个组,然后取每个组的最小值和最大值的建议都将不胜感激。

    -- CREATE TABLE test (mobile_id int, state varchar, time timestamp, region varchar)
    
    INSERT INTO test (mobile_id, state, time, region ) VALUES
    (1, 'active', TIMESTAMP '2018-08-09 15:00:00', 'EU'),  
    (1, 'active', TIMESTAMP '2018-08-09 16:00:00', 'EU'),
    (1, 'no_signal', TIMESTAMP '2018-08-09 17:00:00', 'EU'),
    (1, 'no_signal', TIMESTAMP '2018-08-09 18:00:00', 'EU'),
    (1, 'no_signal', TIMESTAMP '2018-08-09 19:00:00', 'EU'),
    (1, 'active', TIMESTAMP '2018-08-09 20:00:00', 'EU'),
    (1, 'inactive', TIMESTAMP '2018-08-09 21:00:00', 'EU'),
    (1, 'active', TIMESTAMP '2018-08-09 22:00:00', 'EU'),
    (1, 'active', TIMESTAMP '2018-08-09 23:00:00', 'EU'),
    (2, 'active', TIMESTAMP '2018-08-10 00:00:00', 'EU'),
    (2, 'no_signal', TIMESTAMP '2018-08-10 01:00:00', 'EU'),
    (2, 'active', TIMESTAMP '2018-08-10 02:00:00', 'EU'),
    (2, 'no_signal', TIMESTAMP '2018-08-10 03:00:00', 'EU'),
    (2, 'no_signal', TIMESTAMP '2018-08-10 04:00:00', 'EU'),
    (2, 'no_signal', TIMESTAMP '2018-08-10 05:00:00', 'EU'),
    (2, 'no_signal', TIMESTAMP '2018-08-10 06:00:00', 'EU'),
    (3, 'active', TIMESTAMP '2018-08-10 07:00:00', 'SA'),
    (3, 'active', TIMESTAMP '2018-08-10 08:00:00', 'SA'),
    (3, 'no_signal', TIMESTAMP '2018-08-10 09:00:00', 'SA'),
    (3, 'no_signal', TIMESTAMP '2018-08-10 10:00:00', 'SA'),
    (3, 'inactive', TIMESTAMP '2018-08-10 11:00:00', 'SA'),
    (3, 'inactive', TIMESTAMP '2018-08-10 12:00:00', 'SA'),
    (3, 'no_signal', TIMESTAMP '2018-08-10 13:00:00', 'SA')
    

    我的目标输出如下:

     mobile_id          start_time            end_time diff_time region
             1 2018-08-09 17:00:00 2018-08-09 19:00:00       120     EU
             2 2018-08-10 01:00:00 2018-08-10 01:00:00         0     EU
             2 2018-08-10 03:00:00 2018-08-10 06:00:00       180     EU
             3 2018-08-10 09:00:00 2018-08-10 10:00:00        60     SA
             3 2018-08-10 13:00:00 2018-08-10 13:00:00         0     SA
    

    由于未正确创建组,以下代码不会产生所需的结果:

    select mobile_id, region,
           least(extract(epoch from max(time) - min(time)), 0) as diff
    from (select t.*,
                 count(*) filter (where state = 'no_signal) over (partition by mobile_id, region order by time) as grp
          from t
         ) t
    group by mobile_id, region, grp;
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   Tim Biegeleisen    6 年前

    这是缺口和孤岛问题的一个变种。在本例中,您尝试检测具有 no_signal 每个手机号码。

    ROW_NUMBER 在你的桌子上有两种方式。第一个生成所有记录的序列,按时间排序,而第二个生成每个记录的序列 mobile_id 无\u信号 . 这个 差异 在这些行中,数字值可用于形成每个岛。然后,我们只需聚合并获取min/max timestamp值即可得到所需的结果。

    WITH cte1 AS (
        SELECT *, ROW_NUMBER() OVER (ORDER BY time) rn1
        FROM test
    ),
    cte2 AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY mobile_id ORDER BY time) rn2
        FROM test
        WHERE state = 'no_signal'
    ),
    cte3 AS (
        SELECT t1.*, t2.rn2
        FROM cte1 t1
        LEFT JOIN cte2 t2
            ON t1.mobile_id = t2.mobile_id AND t1.time = t2.time
        WHERE t1.state = 'no_signal'
    )
    
    SELECT
        mobile_id,
        MIN(time) AS start_time,
        MAX(time) AS end_time,
        EXTRACT(epoch FROM MAX(time::timestamp) - MIN(time::timestamp)) / 60 diff_time,
        region
    FROM cte3
    GROUP BY
        mobile_id,
        region,
        (rn1 - rn2)
    ORDER BY
        mobile_id,
        start_time;
    

    enter image description here

    Demo

        2
  •  1
  •   Tim Biegeleisen    6 年前

    demo: db<>fiddle

    SELECT DISTINCT
        mobile_id,
        first_value(time) over (partition by ranked, time) as start_time,        -- B
        first_value(time) over (partition by ranked, time desc) as end_time, 
        region
    FROM
    (
        SELECT *, SUM(is_diff) OVER (ORDER BY time) as ranked                          -- A
        FROM
        (
            SELECT *,
                CASE WHEN state = lag(state) over (order by time) THEN 0 ELSE 1 END as is_diff
            FROM test 
        ) s
    ) s
    WHERE
        state = 'no_signal';
    

    答:问题是您试图对一列进行排序,然后又想对另一列进行分区。这个问题可以通过这个子查询来解决。讨论了这个问题 here

    B:你的窗户造好之后 start_time end_time 通过使用 first_value(time) first_value(time) ... ORDER BY time DESC 功能。 DESC 因为它用最新时间对窗口排序,然后你可以得到它的第一个值( last_value() does not work as expected every time ).


    为了更清楚地了解真正的问题,我省略了 diff 上面的计算:添加 差异

    SELECT 
        *,  
        EXTRACT(epoch from (end_time - start_time)) / 60 as diff
    FROM (
        -- <QUERY ABOVE>
    ) s