代码之家 › 专栏 › 技术社区 › Haris Irshad

如何从对象元组数组中访问对象的成员

tuples apache-spark scala

Haris Irshad · 技术社区 · 7 年前

对象类为

class VertexAttributes(val m: Boolean, n: Any){

        val rootParentCustNumber: String = if(n == null) "Was Null" else n.toString
        val firstMsgFlg = m

}

我有此对象类型的RDD:

scala> myGraph.vertices
res92: org.apache.spark.graphx.VertexRDD[VertexAttributes] = VertexRDDImpl[2280] at RDD at VertexRDD.scala:57

在RDD上进行过滤,我得到以下结果:

scala> res92.filter{case(k,m) => k == 964088677}.collect
res94: Array[(org.apache.spark.graphx.VertexId, VertexAttributes)] = Array((964088677,VertexAttributes@2612b83f))

如何访问 VertexAttributes@2612b83f.rootParentCustNumber 在里面 Array((964088677,VertexAttributes@2612b83f))

我试过了 res92.filter{case(k,m) => k == 964088677}.map{case Array(k,m)=> m.rootParentCustNumber}

但我得到以下错误:

<console>:243: error: pattern type is incompatible with expected type;
 found   : Array[T]
 required: (org.apache.spark.graphx.VertexId, VertexAttributes)
    (which expands to)  (Long, VertexAttributes)
       res92.filter{case(k,m) => k == 964088677}.map{case Array(k,m)=> m.rootParentCustNumber}
                                                               ^

1 回复 | 直到 7 年前

Xavier Guihot 7 年前

过滤阶段不会更改RDD的类型(即 RDD[(Long, VertexAttributes)] )。

因此,您可以使用映射阶段对过滤器阶段返回的RDD进行管道处理,并以与过滤阶段相同的方式处理每条记录:

res92
  .filter{ case (k, m) => k == 964088677 }
  .map{ case (k, m) => m.rootParentCustNumber }

我认为您被collect阶段误导了,该阶段将RDD转换为数组。

推荐文章

Geoffrey · Pyspark:将数据帧保存到多个具有单个文件特定大小的镶木地板文件中

1 年前

Bruno Peixoto · Spark群集CI管道构建失败

1 年前

codebot · 将df从pandas转换为PySpark时会删除列名

1 年前

mcsilvio · 在foreach中组织联接的最佳方式是什么?

1 年前

Dhruv · 在sbt控制台上运行Spark

1 年前

Leonard · Pyspark:JSON到Pyspark数据帧

1 年前

billie class · 将列中的值重写为列表中的下一个值

2 年前

Calcutta · Google Colab中的Spark SQL在大数据上失败

2 年前

Doraemon · PySpark:使用不同值的字符串类型列创建聚合列

2 年前

OdiumPura · 使用JDBC(Sql server)查询tempview

2 年前