代码之家  ›  专栏  ›  技术社区  ›  Eugene Lopatkin

ApacheSpark2.3.1和HiveMetastore3.1.0

  •  2
  • Eugene Lopatkin  · 技术社区  · 6 年前

    我们已将HDP群集升级到3.1.1.3.0.1.0-187,并发现:

    1. 配置单元有一个新的元存储位置
    2. Spark看不到配置单元数据库

    事实上我们看到:

    org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found
    

    你能帮我理解发生了什么事以及如何解决这个问题吗?

    更新:

    配置:

    (spark.sql.warehouse.dir,/warehouse/tablespace/external/hive/) (spark.admin.acls,) (spark.yarn.dist.files,文件:///opt/folder/config.yml,文件:///opt/jdk1.8.0_172/jre/lib/security/cacerts) (spark.history.kerberos.keytab,/etc/security/keytab/spark.service.keytab) (spark.io.compression.lz4.blocksize,128kb) (spark.executor.extrajavaoptions,-djavax.net.ssl.truststore=cacerts) (spark.history.fs.logdirectory,hdfs:///spark2 history/) (spark.io.encryption.keygen.algorithm,hmacsha1) (spark.sql.autobroadcastjointherhold,26214400) (spark.eventlog.enabled,true)(spark.shuffle.service.enabled,true) (spark.driver.extralibrarypath,/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64) (spark.ssl.keystore,/etc/security/serverkeys/server keystore.jks) (spark.yarn.queue,默认值) (spark.jars,文件:/opt/folder/component-assembly-0.1.0-snapshot.jar) (spark.ssl.enabled,true)(spark.sql.orc.filterpushdown,true) (spark.shuffle.unsafe.file.output.buffer,5米) (spark.yarn.historyserver.address,master2.env.project:18481) (spark.ssl.truststore,/etc/security/clientkeys/all.jks) (spark.app.name,com.company.env.component.myClass) (spark.sql.hive.metastore.jars,/usr/hdp/current/spark2 client/standalone metastore/*) (spark.io.encryption.keysizebits,128)(spark.driver.memory,2G) (spark.executor.instances,10个) (spark.history.kerberos.principal,spark/edge.env.project@env.project) (火花。不安全。分拣机。溢出。读卡器。缓冲区。大小,1米) (spark.ssl.keypassword,*****(已修订) (spark.ssl.keystrepassword,*******(已编辑) (spark.history.fs.cleaner.enabled,true) (spark.shuffle.io.serverthreads,128个) (spark.sql.hive.convertMetaStoreOrc,真) (spark.submit.deployMode,客户端)(spark.sql.orc.char.enabled,true) (spark.master,纱)(spark.authenticate.enablesalencryption,true) (spark.history.fs.cleaner.interval,7d)(spark.authenticate,true) (Spark.history.fs.cleaner.maxage,90天) (spark.history.ui.acls.enable,true)(spark.acls.enable,true) (spark.history.provider,org.apache.spark.deploy.history.fshistoryprovider) (spark.executor.extralibrarypath,/usr/hdp/current/hadoop client/lib/native:/usr/hdp/current/hadoop client/lib/native/linux-amd64) (spark.executor.memory,2g)(spark.io.encryption.enabled,true) (spark.shuffle.file.buffer,1米) (spark.eventlog.dir,hdfs:///spark2 history/)(spark.ssl.protocol,tls) (spark.dynamicLocation.enabled,true)(spark.executor.cores,3) (斯帕克,历史,用户界面,端口,18081) (spark.sql.statistics.fallbacktohdfs,true) (spark.repl.local.jars,文件:///opt/folder/postgresql-42.2.2.jar,文件:///opt/folder/ojdbc6.jar) (spark.ssl.truststorepassword,*******(已编辑) (spark.history.ui.admin.acls,)(spark.history.kerberos.enabled,true) (spark.shuffle.io.backlog,8192)(spark.sql.orc.impl,本机) (spark.ssl.enabledalgorithms、tls_rsau with_aes_128_cbc_sha、tls_rsau with_aes_256_cbc_sha) (spark.sql.orc.enabled,true) (spark.yarn.dist.jars,文件:///opt/folder/postgresql-42.2.2.jar,文件:///opt/folder/ojdbc6.jar) (spark.sql.hive.metastore.version,3.0版)

    从hive-site.xml:

    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/warehouse/tablespace/managed/hive</value>
    </property>
    

    代码如下:

    val spark = SparkSession
      .builder()
      .appName(getClass.getSimpleName)
      .enableHiveSupport()
      .getOrCreate()
    ...
    dataFrame.write
      .format("orc")
      .options(Map("spark.sql.hive.convertMetastoreOrc" -> true.toString))
      .mode(SaveMode.Append)
      .saveAsTable("name")
    

    Spark提交:

        --master yarn \
        --deploy-mode client \
        --driver-memory 2g \
        --driver-cores 4 \
        --executor-memory 2g \
        --num-executors 10 \
        --executor-cores 3 \
        --conf "spark.dynamicAllocation.enabled=true" \
        --conf "spark.shuffle.service.enabled=true" \
        --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=cacerts" \
        --conf "spark.sql.warehouse.dir=/warehouse/tablespace/external/hive/" \
        --jars postgresql-42.2.2.jar,ojdbc6.jar \
        --files config.yml,/opt/jdk1.8.0_172/jre/lib/security/cacerts \
        --verbose \
        component-assembly-0.1.0-SNAPSHOT.jar \
    
    1 回复  |  直到 6 年前
        1
  •  5
  •   Eugene Lopatkin    6 年前

    看来这是一个未实现的火花 feature . 但是我发现使用spark和hive的唯一方法是使用 HiveWarehouseConnector 来自霍顿。文档 here . 霍顿社区的好向导 here . 在spark开发人员准备好自己的解决方案之前,我不会回答这个问题。