代码之家  ›  专栏  ›  技术社区  ›  enavuio

Pyspark滞后函数返回空值

  •  1
  • enavuio  · 技术社区  · 3 年前

    >>> df.show()
    +----------------------+------------------------+--------------------+
    |date_cast             |id                      |         status    |
    +----------------------+------------------------+--------------------+
    |            2021-02-20|    123...              |open                |
    |            2021-02-21|    123...              |open                |
    |            2021-02-17|    123...              |closed              |
    |            2021-02-22|    123...              |open                |
    |            2021-02-19|    123...              |open                |
    |            2021-02-18|    123...              |closed              |
    +----------------------+------------------------+--------------------+
    

    df_lag = df.withColumn('lag_status',F.lag(df['status']) \
                                     .over(Window.partitionBy("date_cast").orderBy(F.asc('date_cast')))).show()
    

    有人能帮你解决以下问题吗?

    >>> column_list = ["date_cast","id"]
    >>> win_spec = Window.partitionBy([F.col(x) for x in column_list]).orderBy(F.asc('date_cast'))
    >>> df.withColumn('lag_status', F.lag('status').over(
    ...     win_spec
    ...     )
    ... )
    
    +----------------------+------------------------+--------------------+-----------+
    |date_cast             |id.                      |         staus      |lag_status|
    +----------------------+------------------------+--------------------+-----------+
    |            2021-02-19|    123...              |open                |       null|
    |            2021-02-21|    123...              |open                |       null|
    |            2021-02-17|    123...              |open                |       null|
    |            2021-02-18|    123...              |open                |       null|
    |            2021-02-22|    123...              |open                |       null|
    |            2021-02-20|    123...              |open                |       null|
    +----------------------+------------------------+--------------------+-----------+
    
    1 回复  |  直到 3 年前
        1
  •  1
  •   wordinone    3 年前

    铸造日期 有独特的价值。“使用” 身份证件 “相反 例如:

    df_lag = df.withColumn('lag_status',F.lag(df['status']) \
                                     .over(Window.partitionBy("id").orderBy(F.asc('date_cast')))).show()