python - KeyError: SPARK_HOME during SparkConf initialization -
i spark newbie , want run python script command line. have tested pyspark interactively , works. error when trying create sc:
file "test.py", line 10, in <module>     conf=(sparkconf().setmaster('local').setappname('a').setsparkhome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__     sparkcontext._ensure_initialized()   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized     sparkcontext._gateway = gateway or launch_gateway()   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway     spark_home = os.environ["spark_home"]   file "/usr/lib/python2.7/userdict.py", line 23, in __getitem__     raise keyerror(key) keyerror: 'spark_home' 
it seems there 2 problems here.
the first 1 path use. spark_home should point root directory of spark installation in case should /home/dirk/spark-1.4.1-bin-hadoop2.6 not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin.
the second problem way how use setsparkhome. if check a docstring goal to
set path spark installed on worker nodes
sparkconf constructor assumes spark_home on master set. it calls pyspark.context.sparkcontext._ensure_initialized which calls pyspark.java_gateway.launch_gateway, which tries acccess spark_home , fails.
to deal should set spark_home before create sparkconf.
import os os.environ["spark_home"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6" conf = (sparkconf().setmaster('local').setappname('a')) 
Comments
Post a Comment