代码之家  ›  专栏  ›  技术社区  ›  Nathan Feger

如何在sql中查找运行序列中的峰谷

  •  3
  • Nathan Feger  · 技术社区  · 6 年前

    sql fiddle .

    下面是一个示例:

      create table vals (
      timestamp int,
      type varchar(25),
      val int
      );
    
      insert into vals(timestamp,type, val) 
      values      (10, null, 1),
                  (20, null, 2),
                  (39, null, 1),
                  (40,'p',1),
                  (50,'p',2),
                  (60,'p',1),
                  (70,'v',5),
                  (80,'v',6),
                  (90,'v',6),
                  (100,'v',3),
                  (110,null,3),
                  (120,'v',6),
                  (130,null,3),
                  (140,'p',10),
                  (150,'p',8),
                  (160,null,3),
                  (170,'p',1),
                  (180,'p',2),
                  (190,'p',2),
                  (200,'p',1),
                  (210,null,3),
                  (220,'v',1),
                  (230,'v',1),
                  (240,'v',3),
                  (250,'v',41)               
    

    所以最终我会得到:

       timestamp, type, value, is_peak
        (10, null, 1, null),
        (20, null, 2, null),
        (39, null, 1, null),
        (40,'p',1, null),
        (50,'p',2, 1),
        (60,'p',1, null),
        (70,'v',5, null),
        (80,'v',6, null),
        (90,'v',6, null),
        (100,'v',3, 1),
        (110,null,3, null),
        (120,'v',6, 1),
        (130,null,3, null),
        (140,'p',10, 1),
        (150,'p',8, null),
        (160,null,3, null),
        (170,'p',1, null),
        (180,'p',2, 1),
        (190,'p',2, null), -- either this record or 180 would be fine
        (200,'p',1, null),
        (210,null,3, null),
        (220,'v',1, 1), -- again either this or 230
        (230,'v',1, null),
        (240,'v',3, null),
        (250,'v',41, null) 
    

    祝你好运谢谢你的帮助

    3 回复  |  直到 6 年前
        1
  •  3
  •   LukStorms    6 年前

    有一个小技巧可以用来解决像这样的缺口和孤岛问题。

    出于某些目的,这种方法有一些缺点。
    但它对这个案子有效。

    一旦计算了排名,那么它就可以被外部查询中的其他窗口函数使用。
    我们可以再次使用行号。 但根据需要,可以使用稠密秩或MIN&的窗口函数;而不是马克斯。

    然后我们就把它们包起来 CASE 根据不同的逻辑类型。

    select timestamp, type, val, 
    (case 
     when type = 'v' and row_number() over (partition by (rn1-rn2), type order by val, rn1) = 1 then 1
     when type = 'p' and row_number() over (partition by (rn1-rn2), type order by val desc, rn1) = 1 then 1
     end) is_peak
    -- , rn1, rn2, (rn1-rn2) as rnk
    from
    (
      select timestamp, type, val,
       row_number() over (order by timestamp) as rn1,
       row_number() over (partition by type order by timestamp) as rn2
      from vals
    ) q
    order by timestamp;
    

    您可以测试SQL Fiddle here

    timestamp   type    val     is_peak
    ---------   ----    ----    -------
    10          null    1       null
    20          null    2       null
    39          null    1       null
    40          p       1       null
    50          p       2       1
    60          p       1       null
    70          v       5       null
    80          v       6       null
    90          v       6       null
    100         v       3       1
    110         null    3       null
    120         v       6       1
    130         null    3       null
    140         p       10      1
    150         p       8       null
    160         null    3       null
    170         p       1       null
    180         p       2       1
    190         p       2       null
    200         p       1       null
    210         null    3       null
    220         v       1       1
    230         v       1       null
    240         v       3       null
    250         v       41      null
    
        2
  •  3
  •   Lukasz Szozda    6 年前

    你可以用 LEAD/LAG window functions :

    SELECT *,
      CASE WHEN type = 'p' AND val>LAG(val) OVER(PARTITION BY type ORDER BY timestamp)
            AND val > LEAD(val) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
           WHEN type = 'v' AND val<LAG(val) OVER(PARTITION BY type ORDER BY timestamp)
           AND val < LEAD(val) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
      END AS is_peak
    FROM vals
    ORDER BY timestamp;
    

    db<>fiddle demo

    输出:

    ┌───────────┬───────┬──────┬─────────┐
    │ timestamp │ type  │ val  │ is_peak │
    ├───────────┼───────┼──────┼─────────┤
    │       10  │       │   1  │         │
    │       20  │       │   2  │         │
    │       39  │       │   1  │         │
    │       40  │ p     │   1  │         │
    │       50  │ p     │   2  │       1 │
    │       60  │ p     │   1  │         │
    │       70  │ v     │   5  │         │
    │       80  │ v     │   6  │         │
    │       90  │ v     │   6  │         │
    │      100  │ v     │   3  │       1 │
    │      110  │       │   3  │         │
    │      120  │ v     │   6  │         │
    │      130  │       │   3  │         │
    │      140  │ p     │  10  │       1 │
    │      150  │ p     │   8  │         │
    └───────────┴───────┴──────┴─────────┘
    

    带窗口子句的版本:

    SELECT *, CASE WHEN type = 'p' AND val > LAG(val) OVER s
                    AND val > LEAD(val) OVER s THEN 1 
                   WHEN type = 'v' AND val < LAG(val) OVER s
                    AND val < LEAD(val) OVER s THEN 1 
              END AS is_peak
    FROM vals
    WINDOW s AS (PARTITION BY type ORDER BY timestamp)
    ORDER BY timestamp;
    

    db<>fiddle demo2

    编辑

    我想只要一个小小的改动,我们就可以得到时间戳120了,就这样了

    SELECT *,CASE
      WHEN type IN ('p','v') AND val > LAG(val,1,0) OVER(PARTITION BY type ORDER BY timestamp)
      AND val > LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
      WHEN type IN ('v') AND val < LAG(val,1,0) OVER(PARTITION BY type ORDER BY timestamp)
      AND val < LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
     END AS is_peak
    FROM vals
    ORDER BY timestamp;
    

    db<>fiddle demo3


    最终解决方案 gaps-and-islands 检测(处理平台):

    WITH cte AS (
      SELECT *, LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) AS l
      FROM vals
    ), cte2 AS (
      SELECT *, SUM(CASE WHEN val = l THEN 1 ELSE 0 END) OVER(PARTITION BY type ORDER BY timestamp) AS dr
      FROM cte
    ), cte3 AS (
      SELECT *, CASE WHEN type IN ('p') AND val > LAG(val,1) OVER(PARTITION BY type ORDER BY timestamp)
                    AND val >= LEAD(val,1) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
                   WHEN type IN ('v') AND val < LAG(val,1) OVER(PARTITION BY type ORDER BY timestamp)
                    AND val <= LEAD(val,1) OVER(PARTITION BY type ORDER BY timestamp) THEN 1 
              END AS is_peak
      FROM cte2
    )
    SELECT timestamp, type, val,
         CASE WHEN is_peak = 1 THEN 1 
              WHEN EXISTS (SELECT 1 FROM cte3 cx
                           WHERE cx.is_peak = 1
                             AND cx.val = cte3.val
                             AND cx.type = cte3.type
                             AND cx.dr = cte3.dr)
                  THEN 1
         END is_peak
    FROM cte3
    ORDER BY timestamp;
    

    db<>fiddle demo final

    输出:

    ┌────────────┬───────┬──────┬─────────┐
    │ timestamp  │ type  │ val  │ is_peak │
    ├────────────┼───────┼──────┼─────────┤
    │        10  │       │   1  │         │
    │        20  │       │   2  │         │
    │        39  │       │   1  │         │
    │        40  │ p     │   1  │         │
    │        50  │ p     │   2  │       1 │
    │        60  │ p     │   1  │         │
    │        70  │ v     │   5  │         │
    │        80  │ v     │   6  │         │
    │        90  │ v     │   6  │         │
    │       100  │ v     │   3  │       1 │
    │       110  │       │   3  │         │
    │       120  │ v     │   6  │         │
    │       130  │       │   3  │         │
    │       140  │ p     │  10  │       1 │
    │       150  │ p     │   8  │         │
    │       160  │       │   3  │         │
    │       170  │ p     │   1  │         │
    │       180  │ p     │   2  │       1 │
    │       190  │ p     │   2  │       1 │
    │       200  │ p     │   1  │         │
    │       210  │       │   3  │         │
    │       220  │ v     │   1  │       1 │
    │       230  │ v     │   1  │       1 │
    │       240  │ v     │   3  │         │
    │       250  │ v     │  41  │         │
    └────────────┴───────┴──────┴─────────┘
    

    isosql:2016增加了模式匹配 MATCH_RECOGNIZE PATTERN (STRT UP+ FLAT* DOWN+) 但目前只有甲骨文支持。

    Modern SQL - match_recognize Regular Expressions Over Rows

        3
  •  1
  •   3N1GM4    6 年前

    您可以在 case 实现这一目标的声明:

    create table #vals 
    (
        [timestamp] int,
        [type] varchar(25),
        val int
    );
    
    insert into #vals ([timestamp], [type], val) 
    values  (10, null, 1),
            (20, null, 2),
            (30, null, 1),
            (40,'p',1),
            (50,'p',2),
            (60,'p',1),
            (70,'v',5),
            (80,'v',6),
            (90,'v',6),
            (100,'v',3),
            (110,null,3)
    
    select 
        r.*,
        case 
            when r.[type] = 'p' and not exists (select * from #vals c where c.[type] = r.[type] and c.val > r.val) then 1
            when r.[type] = 'v' and not exists (select * from #vals c where c.[type] = r.[type] and c.val < r.val) then 1
            else null
        end as is_peak
    from #vals r
    
    drop table #vals
    

    /----------------------------------\
    | timestamp | type | val | is_peak |
    |-----------|------|-----|---------|
    | 10        | NULL | 1   | NULL    |
    | 20        | NULL | 2   | NULL    |
    | 30        | NULL | 1   | NULL    |
    | 40        | p    | 1   | NULL    |
    | 50        | p    | 2   | 1       |
    | 60        | p    | 1   | NULL    |
    | 70        | v    | 5   | NULL    |
    | 80        | v    | 6   | NULL    |
    | 90        | v    | 6   | NULL    |
    | 100       | v    | 3   | 1       |
    | 110       | NULL | 3   | NULL    |
    \----------------------------------/
    

    注:如果有多个记录具有相同的(峰值) val ,它们将被标记为 1 is_peak 列。

    推荐文章