pyspark - What is spark.python.worker.memory? -


could give me more precise description of spark parameter , how effects program execution? cannot tell parameter "under hood" documentation.

the parameter influences memory limit python workers. if rss of python worker process larger memory limit, spill data memory disk, reduce memory utilization expensive operation.

note value applies per python worker, , there multiple workers per executor.

if want take under hood, @ python/pyspark directory in spark source tree, e.g. externalmerger implementation: https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#l280


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -