代码之家  ›  专栏  ›  技术社区  ›  kometen

自联接与WHERE子句失败

  •  0
  • kometen  · 技术社区  · 6 年前

    我正在使用ApacheDrill使用自引用联接对csv文件中的YTD(迄今为止)进行求和。(简称)查询是

    select
      ... fields from table a ...
      a.PeriodAmount,
      sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
    from dfs.`/home/foo/data/a.csv` a
      left join dfs.`/home/foo/data/a.csv` b
    on
      ... join-conditions ...
    *** where a.Year = '2018' ***
    group by
      ... group-conditions ...
    order by
      ... order-conditions ...
    ;
    

    查询工作 没有 WHERE子句。当WHERE子句是 包括 在同一个数据集上,我得到以下错误:

    Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join
    [Error Id: b62e6b63-eda7-4a52-8f95-2499a1f5c278 on foo:31010] (state=,code=0)
    

    我可以通过删除WHERE子句来规避错误,并改为执行子查询:

    from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') a
      from (select * from dfs.`/home/foo/data/a.csv` where Year = '2017') b
    

    但我不确定这是正确的方法。它使查询更容易出错,因为同一条件必须应用于多个子查询,而不是将其作为自然所属的WHERE子句。

    是否可以重写此自联接以维护where子句?

    这是在Ubuntu16.04上,在Win10上使用wsl,ApacheDrill是ver。1.13。

    完成(处理钻取)查询:

    select
      a.Dep_id,
      a.Dep,
      substr(a.Post_id, 1, 4) as Kap,
      a.Post_id,
      substr(a.Post_id, 5, 2) as Post,
      a.Art_id,
      a.Art,
      a.V_id,
      a.Reg,
      a.Dep_V_id,
      a.Dep_V,
      concat(substr(a.Periode, 1, 4), '-', substr(a.Periode, 5, 2), '-15') as PeriodDate,
      a.Period,
      a.Year,
      a.PeriodAmount,
      sum(cast(b.PeriodAmount as dec(18,3))) as YTDAmount
    from dfs.`/home/foo/data/a.csv` a
      left join dfs.`/home/foo/data/a.csv` b
    on
      a.Dep_id = b.Dep_id
      and a.Post_id = b.Post_id
      and a.Post_id is not null
      and a.Art_id = b.Art_id
      and a.V_id = b.V_id
      and a.Reg = b.Reg
      and a.Dep_V_id = b.Dep_V_id
      and a.Dep_id = b.Dep_id
      and b.Period <= a.Period
      and a.Year = b.Year
      and a.Post_id = b.Post_id
      and a.Art_id = b.Art_id
    where a.Year in ('2018') and b.Year in (a.Year)
    group by
      a.Dep_id,
      a.Dep,
      a.Post_id,
      a.Art_id,
      a.Art,
      a.V_id,
      a.Reg,
      a.Dep_V_id,
      a.Dep_V,
      a.Dep_id,
      a.Period,
      a.Year,
      a.PeriodAmount
    order by
      a.Year,
      a.Dep_id,
      a.Post_id,
      a.Art_id,
      a.V_id,
      a.Reg,
      a.Dep_V_id,
      a.Dep_id,
      a.Period,
      a.PeriodAmount
    ;
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Joakim Danielson    6 年前

    我还没有像这样查询csv文件,所以这更像是一个尝试的建议。

    完成A和B的WHERE子句以帮助编译器完成这类任务怎么样?

    WHERE a.Year = ‘2018’ AND b.Year = ‘2018’
    

    WHERE a.Year = ‘2018’ AND b.Year = a.Year