数据集
:
(apple,1)
(banana,4)
(orange,3)
(grape,2)
(watermelon,2)
其他数据集
是:
(apple,Map(Bob -> 1))
(banana,Map(Chris -> 1))
(orange,Map(John -> 1))
(grape,Map(Smith -> 1))
(watermelon,Map(Phil -> 1))
我的目标是
结合
两组都要获得:
(apple,1,Map(Bob -> 1))
(banana,4,Map(Chris -> 1))
(orange,3,Map(John -> 1))
(grape,2,Map(Smith -> 1))
(watermelon,2,Map(Phil -> 1))
密码
...
val counts_firstDataset = words.map(word =>
(word.firstWord, 1)).reduceByKey{case (x, y) => x + y}
第二个数据集:
...
val counts_secondDataset = secondSet.map(x => (x._1,
x._2.toList.groupBy(identity).mapValues(_.size)))
join方法
val joined_data = counts_firstDataset.join(counts_secondDataset)
但不起作用,因为连接需要[K,V]对。我该如何回避这个问题?