代码之家  ›  专栏  ›  技术社区  ›  Eric Brandt

用于创建结构数组的配置单元选择语句

  •  3
  • Eric Brandt  · 技术社区  · 7 年前

    我在选择Hive中的结构数组时遇到问题。

    我的源表如下所示:

    +-------------+--+
    |    field    |
    +-------------+--+
    | id          |
    | fieldid     |
    | fieldlabel  |
    | fieldtype   |
    | answer_id   |
    | unitname    |
    +-------------+--+
    

    这是调查数据,其中id是调查id,中间的四个字段是响应数据,unitname是调查所属的业务单元。

    我需要为每个调查id的所有答案创建一个结构数组。我本以为这会管用,但实际上不行:

    select id, 
    array( 
        named_struct(
            "field_id",
            fieldid,
            "field_label",
            fieldlabel,
            "field_type",
            fieldtype,
            "answer_id",
            answer_id,)) as answers,
    unitname
    from new_answers;
    

    返回的是每个调查答案(field\u id),作为该答案的一个结构数组,如下所示:

    id | answers | unitname
    1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
    2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
    2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
    2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
    

    但我需要做的是:

    id | answers | unitname    
    1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
    2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
       {"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
       {"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
    

    我搜索了又搜索,但我找到的所有答案似乎都与使用INSERT-INTO有关。。。。VALUES()查询。我已经有了一个表结构;我只是无法让数组按它应该的方式排列。

    任何帮助都将不胜感激。

    出于复制目的,如果需要:

    CREATE TABLE `new_answers`( 
    `id` bigint,
    `fieldid` bigint,
    `fieldlabel` string,
    `fieldtype` string,
    `answer_id` bigint,
    `unitname` string)
    
    INSERT INTO new_answers VALUES
    (1,175877,"Comment","COMMENT",8990947803,"Location1"),
    (2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
    (2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
    (2,47829,"Trans #","TEXT",8990950071,"Location2");
    
    2 回复  |  直到 7 年前
        1
  •  5
  •   Gabe    7 年前

    您似乎正在寻找的功能是将结构收集到一个数组中。Hive提供了两个将内容收集到阵列中的函数:collect\u set和collect\u list。但是,这些函数仅用于创建基本类型的数组。

    砖房项目的罐子( https://github.com/klout/brickhouse/wiki/Downloads

    add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar
    

    然后您可以添加 collect 使用任意名称的函数:

    create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';
    

    以下查询:

    select id
         , collect_struct( 
             named_struct(
               "field_id", fieldid,
               "field_label", fieldlabel,
               "field_type", fieldtype,
               "answer_id", answer_id)) as answers
         , unitname
      from new_answers
     group by id, unitname
    ;
    

    提供以下结果:

    id  answers unitname
    1   [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
    2   [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}]    Location2
    
        2
  •  0
  •   S_K    3 年前
    select id, 
    collect_list( 
        named_struct(
            "field_id", fieldid,
            "field_label", fieldlabel,
            "field_type", fieldtype,
            "answer_id", answer_id,)
                     ) as answers,
    unitname
    from new_answers
    group by id, unitname;
    

    collect\u list用于创建数组

    命名结构用于创建复杂结构。