代码之家  ›  专栏  ›  技术社区  ›  AticusFinch

Updating SQL table with calculations regarding previous rows

  •  1
  • AticusFinch  · 技术社区  · 14 年前

    我正在试图修复一个大型证券交易所数据数据库中的一些错误。一列(数量)表示每一个刻度的交易量,另一列存储累积量(即,前一个刻度的总和)。在某些情况下,第二列是错误的(不是很多,因此我们可以安全地假设不到相邻的刻度是错误的)。因此理论上来说,解决方法很简单:只需搜索一个累计量减少的刻度(这就足够了),然后从最后一个刻度中选择累计量并求和数量。 of the current tick. The thing is that i've been trying to get to work a query that does this in oracle, but i'm struggling due to my lack of expertise in sql. This is what i've gotten so far:

    update
    ( 
        select m.cumulative_volume, q.cum_volume_ant, q.quantity from 
        market_data_intraday_trades m
        join
        (
              select * from
              (select
                product_key,
                sequence_number,
                lead(product_key) over (order by product_key, sequence_number) as product_key_ant,
                to_char(trade_date_time, 'yyyymmdd') as fecha,
                to_char(lag(trade_date_time) over (order by product_key, sequence_number), 'yyyymmdd') as fecha_ant,
                cumulative_volume,
                lead(cumulative_volume) over (order by product_key, sequence_number) as cum_volume_ant,
                cumulative_volume - lead(cumulative_volume) over (order by product_key, sequence_number) as dif 
              from market_data_intraday_trades)
              where product_key = product_key_ant
              and fecha = fecha_ant
              and dif < 0 
              and rownum < 10
        ) q
        on m.sequence_number = q.sequence_number
    )
    set m.cumulative_volume = q.cum_volume_ant + q.quantity
    

    目前的问题是,我似乎无法在外部计算中使用来自内部查询的数量。

    Perhaps all of this would be clearer and/or easier with temporal tables or pl/sql or cursors, but due to corporate policies, i have no priviledges to do that, just selects and updates.

    I would be very grateful if you could point me in some direction to solve this.

    事先谢谢!

    PS. Fecha is date in spanish, just in case :)

    3 回复  |  直到 12 年前
        1
  •  7
  •   APC    14 年前

    Here is some test data. As you can see the CUMULATIVE_VOLUME for the fourth row is wrong.

    SQL> select product_key
      2         , trade_date_time
      3         , quantity
      4         , cumulative_volume
      5         , sum (quantity) over (partition by product_key order by sequence_number) as running_total
      6  from  market_data_intraday_trades
      7  order by sequence_number
      8  /
    
    PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL
    ---- --------- ---------- ----------------- -------------
    ORCL 23-JUN-10        100               100           100
    ORCL 23-JUN-10         50               150           150
    ORCL 25-JUN-10        100               250           250
    ORCL 26-JUN-10        100               250           350
    ORCL 26-JUN-10         50               400           400
    ORCL 27-JUN-10         75               475           475
    
    6 rows selected.
    
    SQL>
    

    最简单的解决方案就是更新。 全部的 the rows with the calculated running total:

    SQL> update market_data_intraday_trades m
      2  set m.cumulative_volume =
      3          ( select inq.running_total
      4            from (
      5                      select sum (quantity) over (partition by product_key
      6                                                  order by sequence_number) as running_total
      7                             , cumulative_volume
      8                             , rowid as row_id
      9                      from  market_data_intraday_trades
     10                  ) inq
     11             where m.rowid = inq.row_id
     12          )
     13  /
    
    6 rows updated.
    
    SQL> select product_key
      2         , trade_date_time
      3         , quantity
      4         , cumulative_volume
      5         , sum (quantity) over (partition by product_key
      6                                order by sequence_number) as running_total
      7         , rowid as row_id
      8  from  market_data_intraday_trades
      9  order by sequence_number
     10  /
    
    PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL 
    ---- --------- ---------- ----------------- ------------- 
    ORCL 23-JUN-10        100               100           100 
    ORCL 23-JUN-10         50               150           150 
    ORCL 25-JUN-10        100               250           250 
    ORCL 26-JUN-10        100               350           350 
    ORCL 26-JUN-10         50               400           400 
    ORCL 27-JUN-10         75               475           475 
    
    6 rows selected.
    
    SQL> 
    

    但是,如果您有大量数据,并且确实不需要所有这些不必要的更新,那么再次使用相同的查询来限制点击量:

    SQL> update market_data_intraday_trades m
      2  set m.cumulative_volume =
      3          ( select inq.running_total
      4            from (
      5                      select sum (quantity) over (partition by product_key
      6                                                  order by sequence_number) as running_total
      7                             , cumulative_volume
      8                             , rowid as row_id
      9                      from  market_data_intraday_trades
     10                  ) inq
     11             where m.rowid = inq.row_id
     12          )
     13  where m.rowid in
     14      ( select inq.row_id
     15            from (
     16                      select sum (quantity) over (partition by product_key
     17                                                  order by sequence_number) as running_total
     18                             , cumulative_volume
     19                             , rowid as row_id
     20                      from  market_data_intraday_trades
     21                  ) inq
     22             where m.cumulative_volume != running_total
     23          )
     24
    SQL> /
    
    1 row updated.
    
    SQL> select product_key
      2         , trade_date_time
      3         , quantity
      4         , cumulative_volume
      5         , sum (quantity) over (partition by product_key
      6                                order by sequence_number) as running_total
      7  from  market_data_intraday_trades
      8  order by sequence_number
      9  /
    
    PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL
    ---- --------- ---------- ----------------- -------------
    ORCL 23-JUN-10        100               100           100
    ORCL 23-JUN-10         50               150           150
    ORCL 25-JUN-10        100               250           250
    ORCL 26-JUN-10        100               350           350
    ORCL 26-JUN-10         50               400           400
    ORCL 27-JUN-10         75               475           475
    
    6 rows selected.
    
    SQL> 
    

    I tried Nicolas's suggestion of using MERGE. If you are using 10g or higher, then this would work. You need a recent version of Oracle because 9i didn't support MERGE with an UPDATE but no INSERT (and 8i didn't support MERGE at all).

    SQL> merge into market_data_intraday_trades m
      2  using ( select running_total
      3                 , row_id
      4          from
      5              (   select sum (quantity) over (partition by product_key
      6                                              order by sequence_number) as running_total
      7                         , cumulative_volume
      8                         , rowid as row_id
      9                  from  market_data_intraday_trades
     10               )
     11           where cumulative_volume != running_total
     12          ) inq
     13  on ( m.rowid = inq.row_id  )
     14  when matched then
     15      update set m.cumulative_volume = inq.running_total
     16  /
    
    1 row merged.
    
    SQL>
    

    这个解决方案比另一个方案更简洁。

        2
  •  4
  •   Rob van Wijk    14 年前

    Just adding a performance comparison to APC's answer:

    SQL> update market_data_intraday_trades m
      2  set m.cumulative_volume =
      3          ( select inq.running_total
      4            from (
      5                      select sum (quantity) over (partition by product_key
      6                                                  order by sequence_number) as running_total
      7                             , cumulative_volume
      8                             , rowid as row_id
      9                      from  market_data_intraday_trades
     10                  ) inq
     11             where m.rowid = inq.row_id
     12          )
     13  /
    
    6 rows updated.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  4mgw11769k00r, child number 0
    -------------------------------------
    update market_data_intraday_trades m set m.cumulative_volume =         ( select inq.running_total
          from (                     select sum (quantity) over (partition by product_key
                                    order by sequence_number) as running_total
    , cumulative_volume                            , rowid as row_id                     from
    market_data_intraday_trades                 ) inq            where m.rowid = inq.row_id         )
    
    Plan hash value: 3204855846
    
    --------------------------------------------------------------------------------------------------------------
    | Id  | Operation            | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    --------------------------------------------------------------------------------------------------------------
    |   1 |  UPDATE              | MARKET_DATA_INTRADAY_TRADES |      1 |        |      0 |00:00:00.01 |      35 |
    |   2 |   TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
    |*  3 |   VIEW               |                             |      6 |      6 |      6 |00:00:00.01 |      18 |
    |   4 |    WINDOW SORT       |                             |      6 |      6 |     36 |00:00:00.01 |      18 |
    |   5 |     TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      6 |      6 |     36 |00:00:00.01 |      18 |
    --------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - filter("INQ"."ROW_ID"=:B1)
    
    
    25 rows selected.
    

    Look at those 36's. That's O(N^2).

    SQL> update market_data_intraday_trades m
      2  set m.cumulative_volume =
      3          ( select inq.running_total
      4            from (
      5                      select sum (quantity) over (partition by product_key
      6                                                  order by sequence_number) as running_total
      7                             , cumulative_volume
      8                             , rowid as row_id
      9                      from  market_data_intraday_trades
     10                  ) inq
     11             where m.rowid = inq.row_id
     12          )
     13  where m.rowid in
     14      ( select inq.row_id
     15            from (
     16                      select sum (quantity) over (partition by product_key
     17                                                  order by sequence_number) as running_total
     18                             , cumulative_volume
     19                             , rowid as row_id
     20                      from  market_data_intraday_trades
     21                  ) inq
     22             where m.cumulative_volume != running_total
     23          )
     24
    SQL> /
    
    1 row updated.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  8fg3vnav1t742, child number 0
    -------------------------------------
    update market_data_intraday_trades m set m.cumulative_volume =         ( select inq.running_total
         from (                     select sum (quantity) over (partition by product_key
                                  order by sequence_number) as running_total                            ,
    cumulative_volume                            , rowid as row_id                     from
    market_data_intraday_trades                 ) inq            where m.rowid = inq.row_id         )
    where m.rowid in     ( select inq.row_id           from (                     select sum (quantity)
    over (partition by product_key                                                 order by
    sequence_number) as running_total                            , cumulative_volume
         , rowid as row_id                     from  market_data_intraday_trades                 ) inq
           where m.cumulative_volume != running_total         )
    
    Plan hash value: 1087408236
    
    ---------------------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------------------------
    |   1 |  UPDATE               | MARKET_DATA_INTRADAY_TRADES |      1 |        |      0 |00:00:00.01 |      14 |
    |*  2 |   HASH JOIN SEMI      |                             |      1 |      5 |      1 |00:00:00.01 |       6 |
    |   3 |    TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
    |   4 |    VIEW               |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
    |   5 |     WINDOW SORT       |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
    |   6 |      TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
    |*  7 |   VIEW                |                             |      1 |      6 |      1 |00:00:00.01 |       4 |
    |   8 |    WINDOW SORT        |                             |      1 |      6 |      6 |00:00:00.01 |       4 |
    |   9 |     TABLE ACCESS FULL | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       4 |
    ---------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("M".ROWID="INQ"."ROW_ID")
           filter("M"."CUMULATIVE_VOLUME"<>"RUNNING_TOTAL")
       7 - filter("INQ"."ROW_ID"=:B1)
    
    
    36 rows selected.
    

    那好多了。

    SQL> merge into market_data_intraday_trades mdit1
      2  using ( select product_key
      3               , sequence_number
      4               , running_total
      5            from ( select product_key
      6                        , sequence_number
      7                        , cumulative_volume
      8                        , sum(quantity) over (partition by product_key order by sequence_number) as running_total
      9                     from market_data_intraday_trades
     10                 )
     11           where cumulative_volume != running_total
     12        ) mdit2
     13     on (   mdit1.product_key = mdit2.product_key
     14        and mdit1.sequence_number = mdit2.sequence_number
     15        )
     16   when matched then
     17        update set mdit1.cumulative_volume = mdit2.running_total
     18  /
    
    1 row merged.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------
    SQL_ID  cjafdk3jg4gzz, child number 0
    -------------------------------------
    merge into market_data_intraday_trades mdit1 using ( select product_key              , sequence_number
                , running_total           from ( select product_key                       , sequence_number
                          , cumulative_volume                       , sum(quantity) over (partition by
    product_key order by sequence_number) as running_total                    from
    market_data_intraday_trades                )          where cumulative_volume != running_total       )
    mdit2    on (   mdit1.product_key = mdit2.product_key       and mdit1.sequence_number =
    mdit2.sequence_number       )  when matched then       update set mdit1.cumulative_volume =
    mdit2.running_total
    
    Plan hash value: 2367693855
    
    ----------------------------------------------------------------------------------------------------------------
    | Id  | Operation              | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    ----------------------------------------------------------------------------------------------------------------
    |   1 |  MERGE                 | MARKET_DATA_INTRADAY_TRADES |      1 |        |      1 |00:00:00.01 |       9 |
    |   2 |   VIEW                 |                             |      1 |        |      1 |00:00:00.01 |       6 |
    |*  3 |    HASH JOIN           |                             |      1 |      6 |      1 |00:00:00.01 |       6 |
    |*  4 |     VIEW               |                             |      1 |      6 |      1 |00:00:00.01 |       3 |
    |   5 |      WINDOW SORT       |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
    |   6 |       TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
    |   7 |     TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
    ----------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - access("MDIT1"."PRODUCT_KEY"="PRODUCT_KEY" AND "MDIT1"."SEQUENCE_NUMBER"="SEQUENCE_NUMBER")
       4 - filter("CUMULATIVE_VOLUME"<>"RUNNING_TOTAL")
    
    
    31 rows selected.
    

    But merge beats them all with one table scan less.

    当做, Rob。

        3
  •  3
  •   N. Gasparotto    14 年前

    Did you try MERGE statement ? Maybe and depending of your Oracle version it could be a way to investigate, at least it could make your statement simpler.

    尼古拉斯。