代码之家 › 专栏 › 技术社区 › Tom Lous

Mongoexport在Spark中严格加载json

mongoexport apache-spark mongodb json

Tom Lous · 技术社区 · 7 年前

mongoexport . documentation 提到所有json输出都在 严格的 模式

"{amount":{"$numberLong":"3"},"count":{"$numberLong":"245"}}

case class MongoData(amount: Long, count: Long)

读取数据当然会失败,如下所示:

spark
      .read
      .json(inputPath)
      .as[MongoData]

有没有办法从mongo导出而不使用严格模式,或者在Scala中导入json而不手动将每个字段重新构造为适当的结构?

1 回复 | 直到 7 年前

Tom Lous 7 年前

我现在用这个作为解决方案。但感觉有点粗糙。

case class DataFrameExtended(dataFrame: DataFrame) {

   def undoMongoStrict(): DataFrame = {
    val numberLongType = StructType(List(StructField("$numberLong", StringType, true))) 

    def restructure(fields: Array[StructField], nesting: List[String] = Nil): List[Column] = {
      fields.flatMap(field => {
        val fieldPath = nesting :+ field.name
        val fieldPathStr = fieldPath.mkString(".")
        field.dataType match {
          case dt: StructType if dt == numberLongType =>
            Some(col(s"$fieldPathStr.$$numberLong").cast(LongType).as(field.name))
          case dt: StructType =>
            Some(struct(restructure(dt.fields, fieldPath): _*).as(field.name))
          case _ => Some(col(fieldPathStr).as(field.name))
          //              case dt:ArrayType => //@todo handle other DataTypes Array??
        }
      })
    }.toList


    dataFrame.select(restructure(dataFrame.schema.fields): _*)
  }
}

implicit def dataFrameExtended(df: DataFrame): DataFrameExtended = {
  DataFrameExtended(df)
}

spark
  .read
  .json(inputPath)
  .undoMongoStrict()

推荐文章

Ankit Kumar · 如何在mongodb中过滤至少6个月的数据?

1 年前

Meikel Rizky Hartawan · Mongo DB未按id更新

2 年前

Hayato · 如何检查我是如何安装MongoDB的

2 年前

Valeri · 如何仅获取布尔值数组中没有“false”的文档?

2 年前

Adrien Chapelet · MongoDB聚合与阵列内的外部模型

2 年前

Agrim Singh · 用户数据。名称未显示用户名

2 年前

danilonet · MongoDb。NET-ObjectId序列化

2 年前

robert_gonzalez · MongoDB:统计子文档中要素的出现次数

2 年前

Windy · MongoDB-查询计算和分组多个项目

2 年前

Mike Kharkov · 无法从数据库中检索多个值

2 年前