代码之家  ›  专栏  ›  技术社区  ›  Nira

Dataproc依赖项冲突-google api客户端

  •  1
  • Nira  · 技术社区  · 6 年前

    我正在构建一个库,用于从云存储中获取加密机密(在Scala中,使用Java客户端)。我正在使用以下google库:

    "com.google.apis"  % "google-api-services-cloudkms" % "v1-rev26-1.23.0" exclude("com.google.guava", "guava-jdk5"),
    "com.google.cloud" % "google-cloud-storage"         % "1.14.0",
    

    本地一切正常,但当我尝试在Dataproc中运行代码时,出现以下错误:

    Exception in thread "main" java.lang.NoSuchMethodError: com.google.api.client.googleapis.services.json.AbstractGoogleJsonClient$Builder.setBatchPath(Ljava/lang/String;)Lcom/google/api/client/googleapis/services/AbstractGoogleClient$Builder;
        at com.google.api.services.cloudkms.v1.CloudKMS$Builder.setBatchPath(CloudKMS.java:4250)
        at com.google.api.services.cloudkms.v1.CloudKMS$Builder.<init>(CloudKMS.java:4229)
        at gcp.encryption.EncryptedSecretsUser$class.clients(EncryptedSecretsUser.scala:111)
        at gcp.encryption.EncryptedSecretsUser$class.getEncryptedSecrets(EncryptedSecretsUser.scala:62)
    

    我的代码中有问题的一行是:

    val kms: CloudKMS = new CloudKMS.Builder(credential.getTransport,
          credential.getJsonFactory,
          credential)
          .setApplicationName("Encrypted Secrets User")
          .build()
    

    我在 documentation Dataproc上有一些google库(我使用的是一个Spark集群,图像版本为1.2.15)。但就我所见,google api客户端的可传递依赖项与我在本地使用的相同(1.23.0)。那么,为什么找不到该方法呢?

    我是否应该为在Dataproc上运行设置不同的依赖项?

    编辑

    最终在另一个项目中解决了这个问题。事实证明,除了隐藏所有google依赖项(包括gcs连接器!!),您还必须向JVM注册着色类来处理gs://file系统。 以下是适用于我的maven配置,sbt也可以实现类似的功能:

    父POM:

    <project xmlns="http://maven.apache.org/POM/4.0.0"...>
    ...
    <properties>
        <!-- Spark version -->
        <spark.version>[2.2.1]</spark.version>
        <!-- Jackson-libs version pulled in by spark -->
        <jackson.version>[2.6.5]</jackson.version>
        <!-- Avro version pulled in by jackson -->
        <avro.version>[1.7.7]</avro.version>
        <!-- Kryo-shaded version pulled in by spark -->
        <kryo.version>[3.0.3]</kryo.version>
        <!-- Apache commons-lang version pulled in by spark -->
        <commons.lang.version>2.6</commons.lang.version>
    
        <!-- TODO: need to shade google libs because of version-conflicts on Dataproc. Remove this when Dataproc 1.3/2.0 is released -->
        <bigquery-conn.version>[0.10.6-hadoop2]</bigquery-conn.version>
        <gcs-conn.version>[1.6.5-hadoop2]</gcs-conn.version>
        <google-storage.version>[1.29.0]</google-storage.version>
        <!-- The guava version we want to use -->
        <guava.version>[23.2-jre]</guava.version>
        <!-- The google api version used by the google-cloud-storage lib -->
        <api-client.version>[1.23.0]</api-client.version>
        <!-- The google-api-services-storage version used by the google-cloud-storage lib -->
        <storage-api.version>[v1-rev114-1.23.0]</storage-api.version>
    
        <!-- Picked up by compiler and resource plugins -->
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    
    ...
    
    <build>
        <pluginManagement>
            <plugins>
    ...
    
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.1.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>true</minimizeJar>
                            <filters>
                                <filter>
                                    <artifact>com.google.**:*</artifact>
                                    <includes>
                                        <include>**</include>
                                    </includes>
                                </filter>
                                <filter>
                                    <artifact>com.google.cloud.bigdataoss:gcs-connector</artifact>
                                    <excludes>
                                        <!-- Register a provider with the shaded name instead-->
                                        <exclude>META-INF/services/org.apache.hadoop.fs.FileSystem</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <artifactSet>
                                <includes>
                                    <include>com.google.*:*</include>
                                </includes>
                                <excludes>
                                    <exclude>com.google.code.findbugs:jsr305</exclude>
                                </excludes>
                            </artifactSet>
                            <relocations>
                                <relocation>
                                    <pattern>com.google</pattern>
                                    <shadedPattern>com.shaded.google</shadedPattern>
                                </relocation>
                            </relocations>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
    ...
        </plugins>
    </build>
    
    <dependencyManagement>
        <dependencies>
            <dependency>
    ...
                <groupId>com.google.cloud.bigdataoss</groupId>
                <artifactId>gcs-connector</artifactId>
                <version>${gcs-conn.version}</version>
                <exclusions>
                    <!-- conflicts with Spark dependencies -->
                    <exclusion>
                        <groupId>org.apache.hadoop</groupId>
                        <artifactId>hadoop-common</artifactId>
                    </exclusion>
                    <!-- conflicts with Spark dependencies -->
                    <exclusion>
                        <groupId>org.apache.hadoop</groupId>
                        <artifactId>hadoop-mapreduce-client-core</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>com.google.guava</groupId>
                        <artifactId>guava</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <!-- Avoid conflict with the version pulled in by the GCS-connector on Dataproc -->
                <groupId>com.google.apis</groupId>
                <artifactId>google-api-services-storage</artifactId>
                <version>${storage-api.version}</version>
            </dependency>
            <dependency>
                <groupId>commons-lang</groupId>
                <artifactId>commons-lang</artifactId>
                <version>${commons.lang.version}</version>
            </dependency>
            <dependency>
                <groupId>com.esotericsoftware</groupId>
                <artifactId>kryo-shaded</artifactId>
                <version>${kryo.version}</version>
            </dependency>
            <dependency>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>jackson-databind</artifactId>
                <version>${jackson.version}</version>
            </dependency>
            <dependency>
                <groupId>com.google.api-client</groupId>
                <artifactId>google-api-client</artifactId>
                <version>${api-client.version}</version>
            </dependency>
            <dependency>
                <groupId>com.google.guava</groupId>
                <artifactId>guava</artifactId>
                <version>${guava.version}</version>
            </dependency>
        </dependencies>
    </dependencyManagement>
    
    <dependencies>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-storage</artifactId>
            <version>${google-storage.version}</version>
            <exclusions>
                <!-- conflicts with Spark dependencies -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
        </dependency>
    ...
    </dependencies>
    
    ...
    </project>
    

    儿童POM:

        <dependencies>
        <!-- Libraries available on dataproc -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>com.google.cloud.bigdataoss</groupId>
            <artifactId>gcs-connector</artifactId>
        </dependency>
        <dependency>
            <groupId>com.esotericsoftware</groupId>
            <artifactId>kryo-shaded</artifactId>
            <scope>provided</scope><!-- Pulled in by spark -->
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <scope>provided</scope><!-- Pulled in by spark -->
        </dependency>
    </dependencies>
    

    并添加名为 org.apache.hadoop.fs.FileSystem 在下面 path/to/your-project/src/main/resources/META-INF/services ,包含着色类的名称,例如:

    # WORKAROUND FOR DEPENDENCY CONFLICTS ON DATAPROC
    #
    # Use the shaded class as a provider for the gs:// file system
    #
    
    com.shaded.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
    

    (请注意,此文件已从 gcs-connector 父POM中的库)

    1 回复  |  直到 6 年前
        1
  •  5
  •   Dennis Huo    6 年前

    这可能并不明显,但 google-api-client 最新稳定的GCS连接器版本实际上是 1.20.0 .

    原因是 this was the commit which rolled the api client version forward to 1.23.0 ,它是一系列提交的一部分,包括 dependency-shading commit 总体目标是不再将可传递依赖项泄漏到作业类路径中,以避免将来的版本冲突问题,代价是每个人都必须自己携带包含完整api客户机依赖项的胖jar。

    然而,事实证明,许多人已经开始依赖GCS连接器提供的api客户机位于类路径上,因此存在着一些生产工作负载,这些工作负载无法在较小的版本升级中经受住这样的变化;因此,升级后的GCS连接器使用1.23.0,但也对其进行了着色处理,使其不再出现在作业类路径中,该连接器将保留给未来的Dataproc 1.3+或2.0+版本。

    在您的情况下,您可以尝试使用 1.20.0 依赖项的版本(您可能还必须降级 google-cloud-storage 您包含的依赖项,尽管 1.22.0 假设没有突破性的变化,这个版本可能仍然有效,因为setBatchPath实际上只在 1.23.0 ),否则您可以尝试 shade all your own dependencies using sbt-assembly .

    我们可以证实 setBatchPath 仅在中引入 1.23.0 :

    $ javap -cp google-api-client-1.22.0.jar com.google.api.client.googleapis.services.AbstractGoogleClient.Builder | grep set
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setRootUrl(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setServicePath(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setGoogleClientRequestInitializer(com.google.api.client.googleapis.services.GoogleClientRequestInitializer);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setHttpRequestInitializer(com.google.api.client.http.HttpRequestInitializer);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setApplicationName(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressPatternChecks(boolean);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressRequiredParameterChecks(boolean);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressAllChecks(boolean);
    
    $ javap -cp google-api-client-1.23.0.jar com.google.api.client.googleapis.services.AbstractGoogleClient.Builder | grep set
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setRootUrl(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setServicePath(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setBatchPath(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setGoogleClientRequestInitializer(com.google.api.client.googleapis.services.GoogleClientRequestInitializer);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setHttpRequestInitializer(com.google.api.client.http.HttpRequestInitializer);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setApplicationName(java.lang.String);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressPatternChecks(boolean);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressRequiredParameterChecks(boolean);
      public com.google.api.client.googleapis.services.AbstractGoogleClient$Builder setSuppressAllChecks(boolean);