我有一个带有两个数组列的数据帧,
+---------+-----------------------+
|itemval |fruit |
+---------+-----------------------+
|[1, 2, 3]|[apple, banana, orange]|
+---------+-----------------------+
我正在尝试压缩它们并创建一个名称-值对
+---------+-----------------------+--------------------------------------+
|itemval |fruit |ziped |
+---------+-----------------------+--------------------------------------+
|[1, 2, 3]|[apple, banana, orange]|[[1, apple], [2, banana], [3, orange]]|
+---------+-----------------------+--------------------------------------+
然后转到json,to-json输出的格式如下
+---------------------------------------------------------------------------+
|ziped |
+---------------------------------------------------------------------------+
|[{"_1":"1","_2":"apple"},{"_1":"2","_2":"banana"},{"_1":"3","_2":"orange"}]|
+---------------------------------------------------------------------------+
我期待的格式是这样的
+---------------------------------------------------------------------------+
|ziped |
+---------------------------------------------------------------------------+
|[{"itemval":"1","name":"apple"},{"itemval":"2","name":"banana"},{"itemval":"3","name":"orange"}]|
+---------------------------------------------------------------------------+
这是我的实现
val df1 = Seq((Array(1,2,3),Array("apple","banana","orange"))).toDF("itemval","fruit")
df1.show(false)
def zipper=udf((list1:Seq[String],list2:Seq[String]) => {
val zipList = list2 zip list1
zipList
)
df1.withColumn("ziped",to_json(zipper($"fruit",$"itemval"))).drop("itemval","fruit").show(false)