代码之家  ›  专栏  ›  技术社区  ›  Kannaiyan

AWS EMR在加速端点配置上引发异常

  •  1
  • Kannaiyan  · 技术社区  · 6 年前

    这是我使用的电子病历步骤,

    s3 dist cp--targetsize 1000--outputcodec=gz --s3endpoint=bucket.s3-accelerate.amazonaws.com--groupby'。 /(\d\d)/\d\d/\d\d/。 '--src s3a://sourceback/--目标 S3a://destbacket/目的地/

    加速终结点例外。

    电子病历版本:

    Release label:emr-5.13.0
    Hadoop distribution:Amazon 2.8.3
    Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Presto 0.194
    

    为了克服这个错误,我缺少什么来传递s3 dist cp的参数?

    Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:55)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:22)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.getClient(GlobalS3Executor.java:122)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:89)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:176)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.doesBucketExist(AmazonS3LiteClient.java:88)
        at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:138)
        at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:116)
        at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.initialize(S3NativeFileSystem.java:448)
        at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:109)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2859)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2896)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2878)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:392)
        at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:869)
        at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
    Caused by: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:670)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonWebServiceClient.withEndpoint(AmazonWebServiceClient.java:897)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:62)
        at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:58)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
        ... 30 more
    Command exiting with ret '1'
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   Volodymyr Zubariev    5 年前

    s3 dist cp建立在Hadoop AWS库之上,该库不支持使用现成的加速桶。

    您希望自己制作一个依赖于Hadoop AWS和Amazon-SDK-S3的JAR,在那里转换所需参数,并扩展S3ClientFactory以实现加速上载。

    Maven依赖项示例:

    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-s3</artifactId>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-core</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>  
    </dependency>
    

    S3客户工厂:

    public class AcceleratedS3ClientFactory extends DefaultS3ClientFactory {
        @Override
        protected AmazonS3 newAmazonS3Client(AWSCredentialsProvider credentials, ClientConfiguration awsConf) {
            AmazonS3ClientBuilder s3Builder = AmazonS3ClientBuilder
                    .standard()
                    .withRegion("s3-accelerate.amazonaws.com")
                    .enableAccelerateMode();
            s3Builder.setCredentials(credentials);
            s3Builder.setClientConfiguration(awsConf);
    
            return s3Builder.build();
        }
    
        @Override
        public AmazonS3 createS3Client(URI name) throws IOException {
            AmazonS3 s3 = super.createS3Client(name);
            // load below bucket name from step configuration as well
            s3.setBucketAccelerateConfiguration("bucket-name",
                    new BucketAccelerateConfiguration(BucketAccelerateStatus.Enabled));
    
            return s3;
        }
    }
    

    最后一步是为Hadoop提供S3工厂类:

    <property>
      <name>fs.s3a.s3.client.factory.impl</name>
      <value>example_package.AcceleratedS3ClientFactory</value>
    </property>
    

    这也可以从命令行中完成,因此您可以直接在EMR接口或EMR SDK中指定它。

    对于副本本身,可以使用Hadoop 文件实用程序.copy API,您可以在那里指定源和目标,以及所需的配置。

    对于某些特定的文件格式,或者不是基于fs的源或目标,spark可以考虑使用上述实用程序。在某些情况下,它可以使传输更快。

    现在您可以向EMR发送JAR步骤:

    aws emr add-steps --cluster-id cluster_id \
    --steps Type=CUSTOM_JAR,Name="a step name",Jar=s3://app/my-s3distcp-1.0.jar,\
    Args=["key","value"]
    

    注: 不要指定Hadoop AWS支持的特定于bucket的端点。它以与加速不兼容的方式使用,每次都会得到相同的异常。

    链接: