python - KeyError: SPARK_HOME during SparkConf initialization -


i spark newbie , want run python script command line. have tested pyspark interactively , works. error when trying create sc:

file "test.py", line 10, in <module>     conf=(sparkconf().setmaster('local').setappname('a').setsparkhome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__     sparkcontext._ensure_initialized()   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized     sparkcontext._gateway = gateway or launch_gateway()   file "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway     spark_home = os.environ["spark_home"]   file "/usr/lib/python2.7/userdict.py", line 23, in __getitem__     raise keyerror(key) keyerror: 'spark_home' 

it seems there 2 problems here.

the first 1 path use. spark_home should point root directory of spark installation in case should /home/dirk/spark-1.4.1-bin-hadoop2.6 not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin.

the second problem way how use setsparkhome. if check a docstring goal to

set path spark installed on worker nodes

sparkconf constructor assumes spark_home on master set. it calls pyspark.context.sparkcontext._ensure_initialized which calls pyspark.java_gateway.launch_gateway, which tries acccess spark_home , fails.

to deal should set spark_home before create sparkconf.

import os os.environ["spark_home"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6" conf = (sparkconf().setmaster('local').setappname('a')) 

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

Maven Javadoc 'Cannot find default setter' and fails -