代码之家  ›  专栏  ›  技术社区  ›  BARATH

配置单元多级分区和select with where子句

  •  0
  • BARATH  · 技术社区  · 6 年前

    我有两个分区的配置单元表,第一个分区是城市,第二个分区是村庄,所以每个城市分区都会包含所有村庄分区的列表

    city1/village1
    city1/village2
    city1/village3
    city2/village5
    city2/village6
    

    如果我的select语句是 select * from table where village = 'village5' 它会在输出结果之前搜索城市1和城市2中的所有分区吗?或者它会看到hive metastore文件并只命中village5分区。

    1 回复  |  直到 6 年前
        1
  •  2
  •   hlagos    6 年前

    这将取决于你的蜂巢版本如何优化它。在我当前的版本(1.1.0)中,Hive能够指向特定的分区,而无需扫描顶部分区

    下面是一个快速演示。

    create table mydb.partition_test 
    (id string)
    partitioned by (city string, village string);
    
    INSERT OVERWRITE TABLE mydb.partition_test PARTITION (city,village)
    select * from (
    
    select '1', 'city1', 'village1'
    union all 
    select '1', 'city1', 'village2'
    union all 
    select '1', 'city1', 'village3'
    union all 
    select '1', 'city2', 'village5'
    union all 
    select '1', 'city2', 'village6'
    ) t;
    
    explain select * from mydb.partition_test where village='village5';
    
    STAGE DEPENDENCIES:
      Stage-0 is a root stage
    
    STAGE PLANS:
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            TableScan
              alias: partition_test
              filterExpr: (village = 'village5') (type: boolean)
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: PARTIAL
              Select Operator
                expressions: id (type: string), city (type: string), 'village5' (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: PARTIAL
                ListSink