代码之家  ›  专栏  ›  技术社区  ›  ishan3243

EMR Presto和Athena查询结果的差异

  •  1
  • ishan3243  · 技术社区  · 6 年前

    我已经将Glue目录连接到Athena和一个EMR实例(安装了presto)。我尝试在两个上运行相同的查询,但得到不同的结果。电子病历显示0行,雅典娜显示43行。查询非常简单,只需 left join group by 和一个 count distinct . 查询如下所示:

    select
      t1.customer_id as id,
      t2.purchase_date as purchase_date,
      count(distinct t1.purchase_id) as item_count
    from 
      table1 t1
    left join
      table2 as t2
      on t2.purchase_id=t1.purchase_id
    where 
      t1.item_type='ABC' 
      and t1.purchase_status='CONFIRMED' 
      and t1.region_id in ('A','B','C')
      and t2.status='Dispatched'
      and t2.purchase_date between date_add('day',-50,date('2018-09-13')) and date('2018-09-13')
      and t1.created_at between date_add('day',-60,date('2018-09-13')) and date('2018-09-13')
      and t1.updated_at between date_add('day',-60,date('2018-09-13')) and date('2018-09-13')
    group by
      t1.customer_id,t2.purchase_date;
    

    EMR Version: 5.17.0
    Presto Version: 0.206
    

    编辑: table1 因为某种原因。不知道为什么会这样,因为普雷斯托埃米尔和雅典娜都使用相同的胶水目录。我还在同一个EMR实例中尝试了Hive,它能够在中找到行 表1 .

    select * from table1 limit 10;
    

    上面的语句给出了10行hivesql,而0行presto sql。我在调试模式中看到以下异常:

    Query 20180917_075536_00023_4988g failed: com.facebook.presto.spi.type.TimestampType
    java.lang.UnsupportedOperationException: com.facebook.presto.spi.type.TimestampType
        at com.facebook.presto.spi.type.AbstractType.writeSlice(AbstractType.java:135)
        at com.facebook.presto.hive.parquet.reader.ParquetBinaryColumnReader.readValue(ParquetBinaryColumnReader.java:55)
        at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.lambda$readValues$1(ParquetPrimitiveColumnReader.java:184)
        at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.processValues(ParquetPrimitiveColumnReader.java:204)
        at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.readValues(ParquetPrimitiveColumnReader.java:183)
        at com.facebook.presto.hive.parquet.reader.ParquetPrimitiveColumnReader.readPrimitive(ParquetPrimitiveColumnReader.java:171)
        at com.facebook.presto.hive.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:208)
        at com.facebook.presto.hive.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:258)
        at com.facebook.presto.hive.parquet.reader.ParquetReader.readBlock(ParquetReader.java:241)
        at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:244)
        at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:222)
        at com.facebook.presto.spi.block.LazyBlock.assureLoaded(LazyBlock.java:262)
        at com.facebook.presto.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:253)
        at com.facebook.presto.spi.Page.getLoadedPage(Page.java:247)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:245)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:373)
        at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:477)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Piotr Findeisen    6 年前

    默认情况下,Presto根据位置将Parquet中的字段与表模式匹配。如果字段的顺序不同(例如,随着时间的推移写入的顺序不同),则需要启用按名称匹配。你可以这样做 hive.properties :

    hive.parquet.use-column-names = true
    

    设置会话hive.parquet_use_column_names =真;

    以下是一个相关问题: https://github.com/prestodb/presto/issues/8911