代码之家  ›  专栏  ›  技术社区  ›  Simon Breton

使用第一个值用前面的非空值填充空值

  •  1
  • Simon Breton  · 技术社区  · 5 年前

    在第一个表中,我有一些在特定时间开始的项目。在第二个表中,我有每个项的开始时间和结束时间之间每分钟的值和时间戳。

    UniqueID  Items start_time
    123       one   10:00 AM
    456       two   11:00 AM
    789       three 11:30 AM
    

    第二张桌子

    UniqueID Items time_hit  value
    123      one   10:00 AM    x
    123      one   10:05 AM    x
    123      one   10:10 AM    x
    123      one   10:30 AM    x
    456      two   11:00 AM    x
    456      two   11:15 AM    x
    789      three 11:30 AM    x
    

    所以当我把这两张桌子连接起来时:

    UniqueID Items start_time  time_hit   value 
    123      one   10:00 AM    10:00 AM   x
    123      null  null        10:05 AM   x
    123      null  null        10:10 AM   x
    123      null  null        10:30 AM   x
    456      two   11:00 AM    11:00 AM   x
    456      null  null        11:15 AM   x
    789      three 11:30 AM    11:30 AM   x
    

    null 具有非空引用行中的值的值。。。

    所以预期的结果是

    UniqueID Items start_time  time_hit   value 
    123      one   10:00 AM    10:00 AM   x
    123      one   10:00 AM    10:05 AM   x
    123      one   10:00 AM    10:10 AM   x
    123      one   10:00 AM    10:30 AM   x
    456      two   11:00 AM    11:00 AM   x
    456      two   11:00 AM    11:15 AM   x
    789      three 11:30 AM    11:30 AM   x
    

    我尝试使用以下函数构建联接,但未成功:

      FIRST_VALUE(Items IGNORE NULLS) OVER (
        PARTITION BY time_hit ORDER BY time_hit
        ROWS BETWEEN CURRENT ROW AND
        UNBOUNDED FOLLOWING) AS test
    

    我的问题有点离题。我发现UniqueID不一致,这就是为什么我在输出中有这些空值。因此,当连接两个表并且其中一个表的行比另一个表的行更多时,验证的答案是填充空值的好选项。

    3 回复  |  直到 5 年前
        1
  •  1
  •   bruceskyaus    5 年前

    你可以用 first_value (但是 last_value 在这种情况下也同样有效)。导入部分是指定 rows between unbounded preceding and current row 设置窗口的边界。

    更新答案以反映更新的问题,并选择 第一个值

    select
    first_value(t1.UniqueId ignore nulls) over (partition by t2.UniqueId
                                               order by t2.time_hit
                                               rows between unbounded preceding and current row) as UniqueId,
    first_value(t1.items ignore nulls) over (partition by t2.UniqueId
                                            order by t2.time_hit
                                            rows between unbounded preceding and current row) as Items,
    first_value(t1.start_time ignore nulls) over (partition by t2.UniqueId
                                            order by t2.time_hit
                                            rows between unbounded preceding and current row) as start_time,
    t2.time_hit,
    t2.item_value
    from table2 t2
    left join table1 t1 on t1.start_time = t2.time_hit
    order by t2.time_hit;
    

    | UNIQUEID | ITEMS | START_TIME | TIME_HIT | ITEM_VALUE |
    |----------|-------|------------|----------|------------|
    |      123 |   one |   10:00:00 | 10:00:00 |          x |
    |      123 |   one |   10:00:00 | 10:05:00 |          x |
    |      123 |   one |   10:00:00 | 10:10:00 |          x |
    |      123 |   one |   10:00:00 | 10:30:00 |          x |
    |      456 |   two |   11:00:00 | 11:00:00 |          x |
    |      456 |   two |   11:00:00 | 11:15:00 |          x |
    |      789 | three |   11:30:00 | 11:30:00 |          x |
    

    SQL Fiddle Example

    注意:我必须在SQL Fiddle中使用Oracle(因此我必须更改数据类型和列名)。但它应该对你的数据库有用。

        2
  •  1
  •   GMB    5 年前

    另一种解决方案是使用 NOT EXISTS JOIN 条件,具有相关子查询,确保我们与相关记录相关。

    SELECT t1.items, t1.start_time, t2.time_hit, t2.value
    FROM table1 t1
    INNER JOIN table2 t2 
        ON  t1.items = t2.items
        AND t1.start_time <= t2.time_hit  
        AND NOT EXISTS (
            SELECT 1 FROM table1 t10
            WHERE 
                t10.items = t2.items 
                AND t10.start_time <= t2.time_hit
                AND t10.start_time > t1.start_time
        )
    

    Demo on DB Fiddle

    | items | start_time | time_hit | value |
    | ----- | ---------- | -------- | ----- |
    | one   | 10:00:00   | 10:00:00 | x     |
    | one   | 10:00:00   | 10:05:00 | x     |
    | one   | 10:00:00   | 10:10:00 | x     |
    | one   | 10:00:00   | 10:30:00 | x     |
    | two   | 11:00:00   | 11:00:00 | x     |
    | two   | 11:00:00   | 11:15:00 | x     |
    | three | 11:30:00   | 11:30:00 | x     |
    

    避免使用的替代解决方案 EXISTS 在一个 加入 条件(在大查询中不允许):只需将该条件移动到 WHERE

    SELECT t1.items, t1.start_time, t2.time_hit, t2.value
    FROM table1 t1
    INNER JOIN table2 t2 
        ON  t1.items = t2.items
        AND t1.start_time <= t2.time_hit  
    WHERE NOT EXISTS (
        SELECT 1 FROM table1 t10
        WHERE 
            t10.items = t2.items 
            AND t10.start_time <= t2.time_hit
            AND t10.start_time > t1.start_time
    )
    

    DB Fiddle

        3
  •  0
  •   Vamsi Praveen Karanam    5 年前

    我猜您正在期待使用内部连接的输出。但不知道你为什么用第一个值。

    SELECT I.Item, I.Start_Time, ID.Time_hit,  ID.Value
    FROM Items I
    INNER JOIN ItemDetails ID
     ON I.Items = ID.Items
    

    请解释一下,如果你在寻找任何具体的理由来审视这种方法。