代码之家  ›  专栏  ›  技术社区  ›  ozw1z5rd

Scala中的Spark数据帧映射分区

  •  -2
  • ozw1z5rd  · 技术社区  · 6 年前

    有没有人有dataframe的mapPartitions函数的工作示例?

    更新:

    MasterBuilder发布的示例理论上是可以的,但实际上有一些问题。请尝试获取类似Json的结构化数据流

    val df = spark.load.json("/user/cloudera/json")
    val newDF = df.mapPartitions(
      iterator => {
    
        val result = iterator.map(data=>{/* do some work with data */}).toList
        //return transformed data
        result.iterator
        //now convert back to df
      }
    
     ).toDF()
    

    以以下错误结束:

    <console>:28: error: Unable to find encoder for type stored in a Dataset.  
    Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  
    Support for serializing other types will be added in future releases.
    

    有没有办法让它工作?

    1 回复  |  直到 6 年前
        1
  •  -3
  •   Masterbuilder    6 年前
     import sqlContext.implicits._
    
        val newDF = df.mapPartitions(
          iterator => {
    
            val result = iterator.map(data=>{/* do some work with data */}).toList
            //return transformed data
            result.iterator
            //now convert back to df
          }
    
    ).toDF()