代码之家  ›  专栏  ›  技术社区  ›  ClassicThunder

sparks3a抛出403错误,而相同的配置适用于AwsS3Client

  •  0
  • ClassicThunder  · 技术社区  · 6 年前

    <spark.version>2.3.1</spark.version>
    <scala.version>2.11.8</scala.version>
    <hadoop.version>2.7.7</hadoop.version>
    
    <artifactId>aws-java-sdk</artifactId>
    <version>1.7.4</version>
    

    下面的代码作为fat jar的一部分提交给spark submit。

    spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    spark.sparkContext.hadoopConfiguration.set("log4j.logger.org.apache.hadoop.fs.s3a", "DEBUG")
    
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.server-side-encryption-algorithm", "AES256")
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", endpoint)
    
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", access)
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secret)
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", session)
    
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.host", proxyHost)
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.port", proxyPort.toString)
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.username", proxyUser)
    spark.sparkContext.hadoopConfiguration.set("fs.s3a.proxy.password", proxyPass)
    
    val credentials = new StaticCredentialsProvider(new BasicSessionCredentials(access, secret, session))
    val config = new ClientConfiguration()
      .withProxyHost(proxyHost)
      .withProxyPort(proxyPort)
      .withProxyUsername(proxyUser)
      .withProxyPassword(proxyPass)
    val s3Client = new AmazonS3Client(credentials, config)
    s3Client.setEndpoint(endpoint)
    
    val `object` = s3Client.getObject(new GetObjectRequest(bucket, key))
    val objectData = `object`.getObjectContent
    println("This works! :) " + objectData.toString)
    
    val json = spark.read.textFile("s3a://" + bucket + "/" + key)
    println("Error before here :( " + json)
    

    使用amazon3client的调用可以正常工作

    This works! :) com.amazonaws.services.s3.model.S3ObjectInputStream@3f736a16
    

    2018-09-12 20:45:59 INFO  S3AFileSystem:1207 - Caught an AmazonServiceException com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: AybHBDYJCeWlw2brLdL0Ezpg5PNTUs9kxUqr17xR6qnv3WTxUQ0T1Vs78aM9mG8bsjTzguePZG0=
    2018-09-12 20:45:59 INFO  S3AFileSystem:1208 - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden
    2018-09-12 20:45:59 INFO  S3AFileSystem:1209 - HTTP Status Code: 403
    2018-09-12 20:45:59 INFO  S3AFileSystem:1210 - AWS Error Code: null
    2018-09-12 20:45:59 INFO  S3AFileSystem:1211 - Error Type: Client
    2018-09-12 20:45:59 INFO  S3AFileSystem:1212 - Request ID: D8A113B7B1AB31B9
    2018-09-12 20:45:59 INFO  S3AFileSystem:1213 - Stack
    com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: D8A113B7B1AB31B9, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: AybHBDYJCeWlw2brLdL0Ezpg5PNTUs9kxUqr17xR6qnv3WTxUQ0T1Vs78aM9mG8bsjTzguePZG0=
        at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
        at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
        at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
        at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:693)
        at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:732)
        at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:702)
        at com.company.HelloWorld$.main(HelloWorld.scala:77)
        at com.company.HelloWorld.main(HelloWorld.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    

    据我所知,他们应该配置相同。所以我不知道为什么客户端可以工作,但是s3a得到了一个403错误?

    2 回复  |  直到 6 年前
        1
  •  3
  •   ClassicThunder    6 年前

    我通过删除AWS-Java-SDK解决了这个问题

    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.7.4</version>
    </dependency>
    

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.8.1</version>
    </dependency>
    
        2
  •  0
  •   stevel    6 年前

    没什么明显的。log4j设置需要转到log4j.properties,但是由于身份验证链故意避免记录任何有用的内容,所以不会有多大帮助

    1. Hadoop软件 S3A troubleshooting docs
    2. 试试这个 cloudstore 工具,显式编写用于执行基本连接器调试并生成可以安全地包含在支持调用中的日志。