python - KeyError: SPARK_HOME during SparkConf initialization -
i spark newbie , want run python script command line. have tested pyspark interactively , works. error when trying create sc:
file "test.py", line 10, in <module> conf=(sparkconf().setmaster('local').setappname('a').setsparkhome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin')) file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__ sparkcontext._ensure_initialized() file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized sparkcontext._gateway = gateway or launch_gateway() file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway spark_home = os.environ["spark_home"] file "/usr/lib/python2.7/userdict.py", line 23, in __getitem__ raise keyerror(key) keyerror: 'spark_home'
it seems there 2 problems here.
the first 1 path use. spark_home
should point root directory of spark installation in case should /home/dirk/spark-1.4.1-bin-hadoop2.6
not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin
.
the second problem way how use setsparkhome
. if check a docstring goal to
set path spark installed on worker nodes
sparkconf
constructor assumes spark_home
on master set. it calls pyspark.context.sparkcontext._ensure_initialized
which calls pyspark.java_gateway.launch_gateway
, which tries acccess spark_home
, fails.
to deal should set spark_home
before create sparkconf
.
import os os.environ["spark_home"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6" conf = (sparkconf().setmaster('local').setappname('a'))
Comments
Post a Comment