pyspark - What is spark.python.worker.memory? -
could give me more precise description of spark parameter , how effects program execution? cannot tell parameter "under hood" documentation.
the parameter influences memory limit python workers. if rss of python worker process larger memory limit, spill data memory disk, reduce memory utilization expensive operation.
note value applies per python worker, , there multiple workers per executor.
if want take under hood, @ python/pyspark directory in spark source tree, e.g. externalmerger
implementation: https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#l280
Comments
Post a Comment