为了进行测试,我创建了一个测试数据框,其中的字符串在问题中提到
val df = Seq(
Tuple1("[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]")
).toDF("col")
那就是
+-------------------------------------------------------------------+
|col |
+-------------------------------------------------------------------+
|[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]|
+-------------------------------------------------------------------+
root
|-- col: string (nullable = true)
这个
udf
函数应如下所示
import org.apache.spark.sql.functions._
def convertToListOfListComplex = udf((ListOfList: String) => {
ListOfList.split("],\\[")
.map(x => x.replaceAll("[\\]\\[]", "").split(","))
.map(splitted => rowTest(splitted(0).trim.toLong, splitted(1).trim.toFloat, splitted(2).trim.toFloat, splitted(3).trim.toInt, splitted(4).trim.toInt))
})
哪里
rowTest
是一个
case class
范围外定义
作为
case class rowTest(a: Long, b: Float, c: Float, d: Int, e: Int)
打电话给
自定义项
功能
df.withColumn("converted", convertToListOfListComplex(col("col")))
应该给你输出
+-------------------------------------------------------------------+--------------------------------------------------------------------+
|col |converted |
+-------------------------------------------------------------------+--------------------------------------------------------------------+
|[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]|[[337, -115.0, -17.5, 6225, 189], [85075, -112.0, -12.5, 6225, 359]]|
+-------------------------------------------------------------------+--------------------------------------------------------------------+
root
|-- col: string (nullable = true)
|-- converted: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- a: long (nullable = false)
| | |-- b: float (nullable = false)
| | |-- c: float (nullable = false)
| | |-- d: integer (nullable = false)
| | |-- e: integer (nullable = false)
站在更安全的一边
你可以用
Try/getOrElse
在
自定义项
作为
import org.apache.spark.sql.functions._
def convertToListOfListComplex = udf((ListOfList: String) => {
ListOfList.split("],\\[")
.map(x => x.replaceAll("[\\]\\[]", "").split(","))
.map(splitted => rowTest(Try(splitted(0).trim.toLong).getOrElse(0L), Try(splitted(1).trim.toFloat).getOrElse(0F), Try(splitted(2).trim.toFloat).getOrElse(0F), Try(splitted(3).trim.toInt).getOrElse(0), Try(splitted(4).trim.toInt).getOrElse(0)))
})
我希望答案对你有帮助