我能用
   
    this SO answer
   
   .
  
  
   注意:此解决方案仅在最多有一个呼叫
   
    window
   
   ,表示不允许多个时间窗口。快速搜索
   
    spark github
   
   显示有一个硬限制
   
    <= 1
   
   窗户。
  
  
   通过使用
   
    withColumn
   
   要为每一行定义存储桶,我们可以直接按该新列分组:
  
  from pyspark.sql import functions as F
from datetime import datetime as dt, timedelta as td
start = dt.now()
second = td(seconds=1)
data = [(start, 0), (start+second, 1), (start+ (12*second), 2)]
df = spark.createDataFrame(data, ('foo', 'bar'))
# Create a new column defining the window for each bar
df = df.withColumn("barWindow", F.col("bar") - (F.col("bar") % 2))
# Keep the time window as is
fooWindow = F.window(F.col("foo"), "12 seconds").start.alias("foo")
# Use the new column created
results = df.groupBy(fooWindow, F.col("barWindow")).count().show()
# +-------------------+---------+-----+
# |                foo|barWindow|count|
# +-------------------+---------+-----+
# |2019-01-24 14:12:48|        0|    2|
# |2019-01-24 14:13:00|        2|    1|
# +-------------------+---------+-----+