代码之家  ›  专栏  ›  技术社区  ›  Berni

根据其他列之间的操作(min、max、sum)将列添加到数据帧

  •  0
  • Berni  · 技术社区  · 6 年前

    val df = Seq(
     ("thin", "Cell phone", 6000, 150,  "01/01/2018"),
     ("Normal", "Tablet", 1500, 200, "01/01/2018"),
     ("Mini", "Tablet", 2000, 250, "02/01/2018"),
     ("Ultra thin", "Cell phone", 5000, 300, "02/01/2018"),
     ("Very thin", "Cell phone", 6000, 400, "03/01/2018"),
     ("Big", "Tablet", 4500, 250, "03/01/2018"),
     ("Bendable", "Cell phone", 3000, 200, "04/01/2018"),
     ("Fordable", "Cell phone", 3000, 150, "05/01/2018"),
     ("Pro", "Cell phone", 4500, 300, "06/01/2018"),
     ("Pro2", "Tablet", 6500, 350, "04/01/2018")).toDF("product", "category", 
     "revenue", "extra", "date")
    

    我想添加一个 Column revenue extra min 手术让我得到一个 例如:

    df.withColumn("output", min("revenue", "extra"))
    

    最小 , max . 然而,我在这里的目标是横向地跨列应用这些概念。

    1 回复  |  直到 6 年前
        1
  •  0
  •   stack0114106    6 年前

    scala> val df = Seq(
         |  ("thin", "Cell phone", 6000, 150,  "01/01/2018"),
         |  ("Normal", "Tablet", 1500, 200, "01/01/2018"),
         |  ("Mini", "Tablet", 2000, 250, "02/01/2018"),
         |  ("Ultra thin", "Cell phone", 5000, 300, "02/01/2018"),
         |  ("Very thin", "Cell phone", 6000, 400, "03/01/2018"),
         |  ("Big", "Tablet", 4500, 250, "03/01/2018"),
         |  ("Bendable", "Cell phone", 3000, 200, "04/01/2018"),
         |  ("Fordable", "Cell phone", 3000, 150, "05/01/2018"),
         |  ("Pro", "Cell phone", 4500, 300, "06/01/2018"),
         |  ("Pro2", "Tablet", 6500, 350, "04/01/2018")).toDF("product", "category",
         |  "revenue", "extra", "date")
    df: org.apache.spark.sql.DataFrame = [product: string, category: string ... 3 more fields]
    
    scala> df.printSchema
    root
     |-- product: string (nullable = true)
     |-- category: string (nullable = true)
     |-- revenue: integer (nullable = false)
     |-- extra: integer (nullable = false)
     |-- date: string (nullable = true)
    
    
    scala> def min2col(x:Int,y:Int):Int =
         | return if(x<y) x else y
    min2col: (x: Int, y: Int)Int
    
    scala>  val myudfmin2col = udf( min2col(_:Int,_:Int):Int )
    myudfmin2col: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,IntegerType,Some(List(IntegerType, IntegerType)))
    
    scala> df.withColumn("output",myudfmin2col('extra,'revenue)).show(false)
    +----------+----------+-------+-----+----------+------+
    |product   |category  |revenue|extra|date      |output|
    +----------+----------+-------+-----+----------+------+
    |thin      |Cell phone|6000   |150  |01/01/2018|150   |
    |Normal    |Tablet    |1500   |200  |01/01/2018|200   |
    |Mini      |Tablet    |2000   |250  |02/01/2018|250   |
    |Ultra thin|Cell phone|5000   |300  |02/01/2018|300   |
    |Very thin |Cell phone|6000   |400  |03/01/2018|400   |
    |Big       |Tablet    |4500   |250  |03/01/2018|250   |
    |Bendable  |Cell phone|3000   |200  |04/01/2018|200   |
    |Fordable  |Cell phone|3000   |150  |05/01/2018|150   |
    |Pro       |Cell phone|4500   |300  |06/01/2018|300   |
    |Pro2      |Tablet    |6500   |350  |04/01/2018|350   |
    +----------+----------+-------+-----+----------+------+
    
    
    scala>
    

    编辑1:

    scala> df.createOrReplaceTempView("product")
    scala> spark.sql("select product,category,revenue,extra,date, case when revenue<extra then revenue else extra end as minextra  from product ").show(false)
    +----------+----------+-------+-----+----------+--------+
    |product   |category  |revenue|extra|date      |minextra|
    +----------+----------+-------+-----+----------+--------+
    |thin      |Cell phone|6000   |150  |01/01/2018|150     |
    |Normal    |Tablet    |1500   |200  |01/01/2018|200     |
    |Mini      |Tablet    |2000   |250  |02/01/2018|250     |
    |Ultra thin|Cell phone|5000   |300  |02/01/2018|300     |
    |Very thin |Cell phone|6000   |400  |03/01/2018|400     |
    |Big       |Tablet    |4500   |250  |03/01/2018|250     |
    |Bendable  |Cell phone|3000   |200  |04/01/2018|200     |
    |Fordable  |Cell phone|3000   |150  |05/01/2018|150     |
    |Pro       |Cell phone|4500   |300  |06/01/2018|300     |
    |Pro2      |Tablet    |6500   |350  |04/01/2018|350     |
    +----------+----------+-------+-----+----------+--------+
    
    
    scala>