apache spark - When does fetch happen from Cassandra -


i have application triggers job spark master. when check ip address executing job, displaying application ip , not spark worker ip. so, understand, call on rdd generates spark worker work.

but question this.

cassandrasqlcontext c = new cassandrasqlcontext(sc);  queryexecution q=c.executesql(cqlcommand); //-----1  q.tordd().count(); //----2 

i saw worker doing 2 nothing 1.

so mean fetch cassandra , rdd creation out of in 1 done in application?

if so, 2 trigger job 2 workers. in case, fetch again cassandra , process count?

can clarify this??

edit

  1. going answer provided, if count call triggers workers function, use of executesql creating rdd in local? create cassandra dataset of data querying ? if that's case, querying cassandra happens twice?

2.. if spark automatically distributes computations of 10 partitions of cassandra among 4 workers, aggregate results? master doing distribution. aggregate too?

  1. if don't cache rdd , count operation, happen? spark try to use same worker used particular partition , append result rdd in node. think has query cassandra partition data again? can provide clarity in this?

  2. if cache rdd, happens? rdd stored in worker , used operations? in case, how different storing dataset in memory , processing it? let me know if right too.

spark loading , transformations of rdd's cql command lazily evaluated.

actions trigger of precursor transformations run, in example, count() action.

the way spark works internally builds graph of transformations. when needs run action, break graph separate sub-tasks can run individual workers.

to single action count(), data fetched cassandra once, , if possible, rdd each executor populated data local each cassandra node.

if action on rdd created q, may still cached in memory , reused. there api calls can make explicitly request rdd cached in memory if plan re-use it.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

Maven Javadoc 'Cannot find default setter' and fails -