similarity - what is the best way to process multi files in storm -
i new apache storm, want use storm similarity of files. want cosine of of file in folder "a" of file in folder "b". can me show way result. much.
i did not understand meant 'cosine of files', in general, can think of each folder 'stream'. can have spouta read-understand-format-emit files in foldera , spoutb same folderb 2 tuple streams (i assuming there differences between 2 folders encoding, formatting etc.). processing bolt can 'subscribe' streams. e.g.,
bolt.fieldsgrouping(spouta, streamname, new fields("field_in_stream")); bolt.fieldsgrouping(spoutb, streamname, new fields("field_in_stream"));
if on other hand, meant 2 different instances of same spout read different folders
- not great idea, because number of spout executors tied #folders have. not scalable.
- load distribution pretty bad.
- if still want it, can use task-index of spout have different spout executors different behavior (different meaning reading different folders)
like this, maybe
public class myspout extends baserichspout { public void open(map conf, topologycontext context, spoutoutputcollector collector) { system.out.println("spout index = " + context.getthistaskid()); } }
Comments
Post a Comment