代码之家  ›  专栏  ›  技术社区  ›  user3407267

如何在scala中为like操作符添加开关盒?

  •  -1
  • user3407267  · 技术社区  · 6 年前

    我有一个列(a,B)的数据帧,其中B列是免费测试,我正在转换为类型(未找到,购买数量太低等等),以便更好地聚合。我创建了一个开关的情况下,所有可能的模式和各自的类型,但它不工作。

    def getType(x: String): String = x match {
        case "Item % not found %" =>"NOT_FOUND"
        case "%purchase count % is too low %" =>"TOO_LOW_PURCHASE_COUNT"
        case _ => "Unknown"
    }
    
    getType("Item 75gb not found") 
    
    val newdf = df.withColumn("updatedType",getType(col("raw_type"))) 
    

    2 回复  |  直到 6 年前
        1
  •  0
  •   pasha701    6 年前

    regexp world中的SQL符号“%”可以替换为“.*”。可以创建自定义项以使值与模式匹配:

    val originalSqlLikePatternMap = Map("Item % not found%" -> "NOT_FOUND",
      // 20 other patterns here
      "%purchase count % is too low %" -> "TOO_LOW_PURCHASE_COUNT")
    val javaPatternMap = originalSqlLikePatternMap.map(v => v._1.replaceAll("%", ".*") -> v._2)
    
    val df = Seq(
      "Item foo not found ", "Foo purchase count 1 is too low ", "#!@"
    ).toDF("raw_type")
    
    val converter = (value: String) => javaPatternMap.find(v => value.matches(v._1)).map(_._2).getOrElse("Unknown")
    val converterUDF = udf(converter)
    
    val result = df.withColumn("updatedType", converterUDF($"raw_type"))
    result.show(false)
    

    输出:

    +--------------------------------+----------------------+
    |raw_type                        |updatedType           |
    +--------------------------------+----------------------+
    |Item foo not found              |NOT_FOUND             |
    |Foo purchase count 1 is too low |TOO_LOW_PURCHASE_COUNT|
    |#!@                             |Unknown               |
    +--------------------------------+----------------------+
    
        2
  •  1
  •   user10458963    6 年前

    使用 when like

    import org.apache.spark.sql.functions.when
    
    val df = Seq(
      "Item foo not found",  "Foo purchase count 1 is too low ", "#!@"
    ).toDF("raw_type")
    
    val newdf = df.withColumn(
      "updatedType",
      when($"raw_type" like "Item % not found%", "NOT_FOUND")
        .when($"raw_type" like "%purchase count % is too low%", "TOO_LOW_PURCHASE_COUNT")
        .otherwise("Unknown")
    )
    

    结果:

    newdf.show
    // +--------------------+--------------------+
    // |            raw_type|         updatedType|
    // +--------------------+--------------------+
    // |  Item foo not found|           NOT_FOUND|
    // |Foo purchase coun...|TOO_LOW_PURCHASE_...|
    // |                 #!@|             Unknown|
    // +--------------------+--------------------+
    

    参考文献: