scala - Does this code take full advantage of Spark's distributed computing and functional programming paradigms -


i have array of n tuples: (key, data). both custom defined objects. in each data object, have string field query.

if there query, string not equal default value "".

what want proportion of array has valid queries.

this code have right now:

val proportion = (inputarr map {case (key, data) =>   data map (d => if (d.query != "") 1 else 0) sum } sum) / inputarr.length 

is optimal code is there better way of accomplishing want do? also, know spark automatically parallelizes reduce, ide suggested replace reduce (_ + _) sum, still automatically distributed , computed, right? also, i'm new functional approach, please let me know if i'm doing inefficiently.

updated:

val betterproportion = (inputarr count { case (key, listdatas) =>   (listdatas count (data => data.query != "")) > 0 }) / inputarr.length 

is better code?

updated:

val betterproportion = (inputs filter { case (key, listdatas) =>   (listdatas count (data => data.query != "")) > 0 } count()) / inputs.count() 

this code updated use original rdd instead of subset array address issue there not being length function on rdds.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -