代码之家 › 专栏 › 技术社区 › Simon Breton

使用第一个值用前面的非空值填充空值

google-bigquery sql

Simon Breton · 技术社区 · 5 年前

在第一个表中,我有一些在特定时间开始的项目。在第二个表中,我有每个项的开始时间和结束时间之间每分钟的值和时间戳。

UniqueID  Items start_time
123       one   10:00 AM
456       two   11:00 AM
789       three 11:30 AM

第二张桌子

UniqueID Items time_hit  value
123      one   10:00 AM    x
123      one   10:05 AM    x
123      one   10:10 AM    x
123      one   10:30 AM    x
456      two   11:00 AM    x
456      two   11:15 AM    x
789      three 11:30 AM    x

所以当我把这两张桌子连接起来时:

UniqueID Items start_time  time_hit   value 
123      one   10:00 AM    10:00 AM   x
123      null  null        10:05 AM   x
123      null  null        10:10 AM   x
123      null  null        10:30 AM   x
456      two   11:00 AM    11:00 AM   x
456      null  null        11:15 AM   x
789      three 11:30 AM    11:30 AM   x

null 具有非空引用行中的值的值。。。

所以预期的结果是

UniqueID Items start_time  time_hit   value 
123      one   10:00 AM    10:00 AM   x
123      one   10:00 AM    10:05 AM   x
123      one   10:00 AM    10:10 AM   x
123      one   10:00 AM    10:30 AM   x
456      two   11:00 AM    11:00 AM   x
456      two   11:00 AM    11:15 AM   x
789      three 11:30 AM    11:30 AM   x

我尝试使用以下函数构建联接,但未成功:

  FIRST_VALUE(Items IGNORE NULLS) OVER (
    PARTITION BY time_hit ORDER BY time_hit
    ROWS BETWEEN CURRENT ROW AND
    UNBOUNDED FOLLOWING) AS test

我的问题有点离题。我发现UniqueID不一致,这就是为什么我在输出中有这些空值。因此,当连接两个表并且其中一个表的行比另一个表的行更多时,验证的答案是填充空值的好选项。

3 回复 | 直到 5 年前

bruceskyaus 5 年前

你可以用 first_value (但是 last_value 在这种情况下也同样有效)。导入部分是指定 rows between unbounded preceding and current row 设置窗口的边界。

更新答案以反映更新的问题,并选择 第一个值

select
first_value(t1.UniqueId ignore nulls) over (partition by t2.UniqueId
                                           order by t2.time_hit
                                           rows between unbounded preceding and current row) as UniqueId,
first_value(t1.items ignore nulls) over (partition by t2.UniqueId
                                        order by t2.time_hit
                                        rows between unbounded preceding and current row) as Items,
first_value(t1.start_time ignore nulls) over (partition by t2.UniqueId
                                        order by t2.time_hit
                                        rows between unbounded preceding and current row) as start_time,
t2.time_hit,
t2.item_value
from table2 t2
left join table1 t1 on t1.start_time = t2.time_hit
order by t2.time_hit;

| UNIQUEID | ITEMS | START_TIME | TIME_HIT | ITEM_VALUE |
|----------|-------|------------|----------|------------|
|      123 |   one |   10:00:00 | 10:00:00 |          x |
|      123 |   one |   10:00:00 | 10:05:00 |          x |
|      123 |   one |   10:00:00 | 10:10:00 |          x |
|      123 |   one |   10:00:00 | 10:30:00 |          x |
|      456 |   two |   11:00:00 | 11:00:00 |          x |
|      456 |   two |   11:00:00 | 11:15:00 |          x |
|      789 | three |   11:30:00 | 11:30:00 |          x |

SQL Fiddle Example

注意:我必须在SQL Fiddle中使用Oracle(因此我必须更改数据类型和列名)。但它应该对你的数据库有用。

GMB 5 年前

另一种解决方案是使用 NOT EXISTS JOIN 条件,具有相关子查询,确保我们与相关记录相关。

SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2 
    ON  t1.items = t2.items
    AND t1.start_time <= t2.time_hit  
    AND NOT EXISTS (
        SELECT 1 FROM table1 t10
        WHERE 
            t10.items = t2.items 
            AND t10.start_time <= t2.time_hit
            AND t10.start_time > t1.start_time
    )

Demo on DB Fiddle

| items | start_time | time_hit | value |
| ----- | ---------- | -------- | ----- |
| one   | 10:00:00   | 10:00:00 | x     |
| one   | 10:00:00   | 10:05:00 | x     |
| one   | 10:00:00   | 10:10:00 | x     |
| one   | 10:00:00   | 10:30:00 | x     |
| two   | 11:00:00   | 11:00:00 | x     |
| two   | 11:00:00   | 11:15:00 | x     |
| three | 11:30:00   | 11:30:00 | x     |

避免使用的替代解决方案 EXISTS 在一个 加入 条件(在大查询中不允许):只需将该条件移动到 WHERE

SELECT t1.items, t1.start_time, t2.time_hit, t2.value
FROM table1 t1
INNER JOIN table2 t2 
    ON  t1.items = t2.items
    AND t1.start_time <= t2.time_hit  
WHERE NOT EXISTS (
    SELECT 1 FROM table1 t10
    WHERE 
        t10.items = t2.items 
        AND t10.start_time <= t2.time_hit
        AND t10.start_time > t1.start_time
)

DB Fiddle

Vamsi Praveen Karanam 5 年前

我猜您正在期待使用内部连接的输出。但不知道你为什么用第一个值。

SELECT I.Item, I.Start_Time, ID.Time_hit,  ID.Value
FROM Items I
INNER JOIN ItemDetails ID
 ON I.Items = ID.Items

请解释一下,如果你在寻找任何具体的理由来审视这种方法。