我跟着去了
these instructions
并在我的机器上安装了Apache Spark(PySark)2.3.1,其规格如下:
当我创建一个
SparkSession
或者间接地通过呼叫
pyspark
从shell或直接在我的应用程序中创建会话:
spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()
我得到以下例外:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
....
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3107)
at java.base/java.lang.String.substring(String.java:1873)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
... 22 more
Traceback (most recent call last):
File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/python/pyspark/shell.py", line 38, in <module>
SparkContext._ensure_initialized()
File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 292, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py", line 93, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
如果我使用的是Jupyter笔记本,我也会在笔记本中看到这个例外:
Exception: Java gateway process exited before sending the driver its port number
我找到并遵循的所有解决方案[
1
,
2
,
3
]指向环境变量定义,但其中没有一个对我有用。