scala - Does this code take full advantage of Spark's distributed computing and functional programming paradigms -

- May 15, 2014

i have array of n tuples: (key, data). both custom defined objects. in each data object, have string field query.

if there query, string not equal default value "".

what want proportion of array has valid queries.

this code have right now:

val proportion = (inputarr map {case (key, data) =>   data map (d => if (d.query != "") 1 else 0) sum } sum) / inputarr.length

is optimal code is there better way of accomplishing want do? also, know spark automatically parallelizes reduce, ide suggested replace reduce (_ + _) sum, still automatically distributed , computed, right? also, i'm new functional approach, please let me know if i'm doing inefficiently.

updated:

val betterproportion = (inputarr count { case (key, listdatas) =>   (listdatas count (data => data.query != "")) > 0 }) / inputarr.length

is better code?

updated:

val betterproportion = (inputs filter { case (key, listdatas) =>   (listdatas count (data => data.query != "")) > 0 } count()) / inputs.count()

this code updated use original rdd instead of subset array address issue there not being length function on rdds.

Search This Blog

Post

scala - Does this code take full advantage of Spark's distributed computing and functional programming paradigms -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

Maven Javadoc 'Cannot find default setter' and fails -

javascript - SAPUI5 Filling SmartTable with OData from XMII -