这两个查询的物理平面图以及横向视图查询是相同的。显然,第一个查询的分析逻辑计划与横向视图查询相同。优化后,所有三个查询的逻辑计划都变得相同。
创建数据帧:
df = spark.range(1).selectExpr("array(array(1,2),array(3,4)) kit")
第一个查询:
spark.sql('select explode(kit) exploded, exploded[0] from tabla').explain(True)
== Parsed Logical Plan ==
'Project ['explode('kit) AS exploded#308, unresolvedalias('exploded[0], None)]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded: array<int>, exploded[0]: int
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode(kit#292), false, [exploded#309]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#309]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#309, exploded#309[0] AS exploded[0]#310]
+- Generate explode([[1,2],[3,4]]), false, [exploded#309]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)
第二个查询:使用子查询
spark.sql('select exploded[0] from (select explode(kit) exploded from tabla)').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'SubqueryAlias __auto_generated_subquery_name
+- 'Project ['explode('kit) AS exploded#313]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#314[0] AS exploded[0]#315]
+- SubqueryAlias __auto_generated_subquery_name
+- Project [exploded#314]
+- Generate explode(kit#292), false, [exploded#314]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), [0], false, [exploded#314]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#314[0] AS exploded[0]#315]
+- Generate explode([[1,2],[3,4]]), false, [exploded#314]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)
第三个问题:使用
lateral view
spark.sql('select exploded[0] from tabla lateral view explode(kit) as exploded').explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('exploded[0], None)]
+- 'Generate 'explode('kit), false, as, ['exploded]
+- 'UnresolvedRelation [tabla]
== Analyzed Logical Plan ==
exploded[0]: int
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode(kit#292), false, as, [exploded#353]
+- SubqueryAlias tabla
+- Project [array(array(1, 2), array(3, 4)) AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Optimized Logical Plan ==
Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), [0], false, as, [exploded#353]
+- Project [[[1,2],[3,4]] AS kit#292]
+- Range (0, 1, step=1, splits=Some(24))
== Physical Plan ==
*(2) Project [exploded#353[0] AS exploded[0]#354]
+- Generate explode([[1,2],[3,4]]), false, [exploded#353]
+- *(1) Project [[[1,2],[3,4]] AS kit#292]
+- *(1) Range (0, 1, step=1, splits=24)