代码之家  ›  专栏  ›  技术社区  ›  Andrey Konyaev

弗林克。当我为OrcTableSource使用hdfs文件时,不支持FileSystemSchemeException

  •  2
  • Andrey Konyaev  · 技术社区  · 7 年前

    我尝试使用OrcTableSource并从hdfs读取文件。

    String fileName = "hdfs://master.host/data/groot.db/events/dt=2017-11-24/000044_0";
    
    Configuration config = new Configuration();
    
    OrcTableSource orcTableSource = OrcTableSource.builder()
        // path to ORC file(s)
        .path(fileName)
        // schema of ORC files
        .forOrcSchema(Schema.schema)
        // Hadoop configuration
        .withConfiguration(config)
        // build OrcTableSource
        .build();
    

    但当我启动它时,我捕捉到异常。

    Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
    at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:472)
    at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:62)
    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:248)
    ... 22 more
    Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179)
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
    ... 27 more
    Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.HdfsConfiguration
    at org.apache.flink.runtime.util.HadoopUtils.getHadoopConfiguration(HadoopUtils.java:49)
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:85)
    ... 28 more
    

    我的pom。xml是

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    
    <groupId>ru.ivi</groupId>
    <artifactId>hdfs2ch</artifactId>
    <version>1.0-SNAPSHOT</version>
    
    <repositories>
        <repository>
            <id>apache.snapshots</id>
            <name>Apache Development Snapshot Repository</name>
            <url>https://repository.apache.org/content/repositories/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>
    
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-core -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-orc_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-jdbc</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-scala -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-hadoop-fs</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
        </dependency>
    
    
        <dependency>
            <groupId>ru.yandex.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>0.1.34</version>
            <scope>system</scope>
            <systemPath>
                /home/akonyaev/git/clickhouse-jdbc/target/clickhouse-jdbc-0.1-SNAPSHOT-jar-with-dependencies.jar
            </systemPath>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.0</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_2.11</artifactId>
            <version>1.5-SNAPSHOT</version>
        </dependency>
    
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.1.4</version>
        </dependency>
    
    </dependencies>
    
    
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
    
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>hdfs2ch.Migrate</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- bind to the packaging phase -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    
    </project>
    

    我添加了库“flink hadoop fs”,但这对我没有帮助。

    知道我将在classpath中添加什么吗?

    我使用flink 1.5-SNAPSHOT

    谢谢z

    2 回复  |  直到 7 年前
        1
  •  1
  •   Andrey Konyaev    7 年前

    为我下定决心。 这是maven的问题。

    重新格式化pom后。xml并删除不需要的依赖项,它正在工作

        2
  •  0
  •   Debajyoti Pathak    6 年前

    我在Eclipse中运行flink作业时遇到了这个错误。添加 flink-shaded-hadoop2 maven dependency为我解决了这个问题。