代码之家 › 专栏 › 技术社区 › PineNuts0

pyspark:将两列的数据类型连接为“struc”-->错误:由于数据类型不匹配,无法解决

structure concatenation pyspark types python

PineNuts0 · 技术社区 · 6 年前

我在pyspark中有一个数据表,其中包含两列,数据类型为“struc”。

请参见下面的示例数据框:

word_verb                   word_noun
{_1=cook, _2=VB}            {_1=chicken, _2=NN}
{_1=pack, _2=VBN}           {_1=lunch, _2=NN}
{_1=reconnected, _2=VBN}    {_1=wifi, _2=NN}

我想将这两列连接在一起,以便对连接的动词和名词块进行频率计数。

我尝试了下面的代码:

df = df.withColumn('word_chunk_final', F.concat(F.col('word_verb'), F.col('word_noun')))

但我得到以下错误:

AnalysisException: u"cannot resolve 'concat(`word_verb`, `word_noun`)' due to data type mismatch: input to function concat should have been string, binary or array, but it's [struct<_1:string,_2:string>, struct<_1:string,_2:string>]

我想要的输出表如下。连接的新字段的数据类型为字符串:

word_verb                   word_noun               word_chunk_final
{_1=cook, _2=VB}            {_1=chicken, _2=NN}     cook chicken
{_1=pack, _2=VBN}           {_1=lunch, _2=NN}       pack lunch
{_1=reconnected, _2=VBN}    {_1=wifi, _2=NN}        reconnected wifi

1 回复 | 直到 6 年前

pault Tanjin 6 年前

你的代码就快到了。

假设您的模式如下:

df.printSchema()
#root
# |-- word_verb: struct (nullable = true)
# |    |-- _1: string (nullable = true)
# |    |-- _2: string (nullable = true)
# |-- word_noun: struct (nullable = true)
# |    |-- _1: string (nullable = true)
# |    |-- _2: string (nullable = true)

您只需要访问 _1 每列字段:

import pyspark.sql.functions as F

df.withColumn(
    "word_chunk_final", 
    F.concat_ws(' ', F.col('word_verb')['_1'], F.col('word_noun')['_1'])
).show()
#+-----------------+------------+----------------+
#|        word_verb|   word_noun|word_chunk_final|
#+-----------------+------------+----------------+
#|        [cook,VB]|[chicken,NN]|    cook chicken|
#|       [pack,VBN]|  [lunch,NN]|      pack lunch|
#|[reconnected,VBN]|   [wifi,NN]|reconnected wifi|
#+-----------------+------------+----------------+

另外,你应该使用 concat_ws (用分隔符连接)而不是 concat 在字符串之间加上一个空格。类似于 str.join 在python中工作。

推荐文章

Terio · Typescript:使用变量的值创建自定义类型

2 年前

Kareem Adel · 创建函数类型(TypeScript接口)

2 年前

supercarp · 我在R中的数据帧显示,它包含值,但为空,并且对于除第一行以外的所有行,is\u null返回TRUE

2 年前

Oleksandr · 在java中,不同的数据类型如何依赖于操作系统?

2 年前

chili83 · Typescript:可以在类的私有属性中使用构造函数属性吗?

2 年前

Troskyvs · 映射比较函数导致运行时“bad\u function\u call”

2 年前

MatÄj VondrÃ¡Äek · 如何在c#方法中传递类型?

2 年前

Dave · Haskell:将积分转换为浮点(或浮点)。混合浮点数和整数

6 年前

kreo · 为什么&[T]参数也接受&Vec?

6 年前

Jiji · 将简单对象强制转换为简单的通用接口

6 年前