代码之家  ›  专栏  ›  技术社区  ›  Michail N

在离线Spark集群中安装graphframes包

  •  0
  • Michail N  · 技术社区  · 6 年前

    我有一个离线pyspark集群(没有互联网接入),我需要安装 graphframes

    我已经从 here

    error: missing or invalid dependency detected while loading class file 'Logging.class'.
    Could not access term typesafe in package com,
    because it (or its dependencies) are missing. Check your build definition for
    missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
    A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.
    error: missing or invalid dependency detected while loading class file 'Logging.class'.
    Could not access term scalalogging in value com.typesafe,
    because it (or its dependencies) are missing. Check your build definition for
    missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
    A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe.
    error: missing or invalid dependency detected while loading class file 'Logging.class'.
    Could not access type LazyLogging in value com.slf4j,
    because it (or its dependencies) are missing. Check your build definition for
    missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
    A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.
    

    1 回复  |  直到 6 年前
        1
  •  1
  •   Michail N    6 年前

    我设法安装了图书馆的笔架。首先,我找到了graphframes,其中:

    scala-logging-api_xx-xx.jar
    scala-logging-slf4j_xx-xx.jar
    

    其中xx是scala和jar的正确版本。然后我把它们安装在正确的路径上。因为我在Cloudera机器中工作,正确的路径是:

    如果您不能将它们放在集群中的这个目录中(因为您没有根目录权限,而且您的管理员非常懒惰),您可以简单地将它们添加到spark submit/spark shell中

    spark-submit ..... --driver-class-path /path-for-jar/  \
                       --jars /../graphframes-0.5.0-spark2.1-s_2.11.jar,/../scala-logging-slf4j_2.10-2.1.2.jar,/../scala-logging-api_2.10-2.1.2.jar
    

    这适用于Scala。为了使用python的graphframes,您需要 下载graphframesjar,然后通过shell

    #Extract JAR content
     jar xf graphframes_graphframes-0.3.0-spark2.0-s_2.11.jar
    #Enter the folder
     cd graphframes
    #Zip the contents
     zip graphframes.zip -r *
    

    然后在spark-env.sh的python路径或bash\u概要文件中添加压缩文件 具有

    export PYTHONPATH=$PYTHONPATH:/..proper path/graphframes.zip:.
    

    这个 link 对这个解决方案非常有用