parsing - ELKI CSV parser problems -

- April 15, 2012

i have changed .arff file .csv file in tool in weka. can't use arffparser parser in elki.

what parser should use? default numbervectorlabelparser. gives me arrayindexoutofboundsexception:

running: -verbose -verbose -dbc.in /home/db/lisbet/datasets/without ids/try 2/calling them .txt using parser/lymphography_withoutdupl_norm_1ofn.csv -dbc.parser numbervectorlabelparser -algorithm outlier.lof.lof -lof.k 2 -evaluator outlier.outlierroccurve -rocauc.positive yes task failed java.lang.arrayindexoutofboundsexception: 47     @ de.lmu.ifi.dbs.elki.datasource.parser.numbervectorlabelparser.gettypeinformation(numbervectorlabelparser.java:337)     @ de.lmu.ifi.dbs.elki.datasource.parser.numbervectorlabelparser.buildmeta(numbervectorlabelparser.java:242)     @ de.lmu.ifi.dbs.elki.datasource.parser.numbervectorlabelparser.nextevent(numbervectorlabelparser.java:211)     @ de.lmu.ifi.dbs.elki.datasource.bundle.multipleobjectsbundle.fromstream(multipleobjectsbundle.java:242)     @ de.lmu.ifi.dbs.elki.datasource.parser.abstractstreamingparser.asmultipleobjectsbundle(abstractstreamingparser.java:89)     @ de.lmu.ifi.dbs.elki.datasource.inputstreamdatabaseconnection.loaddata(inputstreamdatabaseconnection.java:91)     @ de.lmu.ifi.dbs.elki.database.staticarraydatabase.initialize(staticarraydatabase.java:119)     @ de.lmu.ifi.dbs.elki.workflow.inputstep.getdatabase(inputstep.java:62)     @ de.lmu.ifi.dbs.elki.kddtask.run(kddtask.java:108)     @ de.lmu.ifi.dbs.elki.application.kddcliapplication.run(kddcliapplication.java:60)     @ [...]

my .csv file looks this:

'lymphatics = deformed','lymphatics = displaced','lymphatics = arched','lymphatics = normal','block_of_affere = yes','block_of_affere = no','bl_of_lymph_c = no','bl_of_lymph_c = yes','bl_of_lymph_s = no','bl_of_lymph_s = yes','by_pass = no','by_pass = yes','extravasates = yes','extravasates = no','regeneration_of = no','regeneration_of = yes','early_uptake_in = yes','early_uptake_in = no','changes_in_lym = oval','changes_in_lym = round','changes_in_lym = bean','defect_in_node = lacunar','defect_in_node = lac_central','defect_in_node = lac_margin','defect_in_node = no','changes_in_node = lac_central','changes_in_node = lacunar','changes_in_node = no','changes_in_node = lac_margin','changes_in_stru = faint','changes_in_stru = drop_like','changes_in_stru = stripped','changes_in_stru = coarse','changes_in_stru = diluted','changes_in_stru = grainy','changes_in_stru = no','changes_in_stru = reticular','special_forms = vesicles','special_forms = no','special_forms = chalices','dislocation_of = no','dislocation_of = yes','exclusion_of_no = yes','exclusion_of_no = no',lym_nodes_dimin,lym_nodes_enlar,no_of_nodes_in,outlier 1,0,0,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0.333333,0.285714,no 0,1,0,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0.333333,0.142857,no

there 11 parsers available. maybe data, large parser.

thank you, bug in elki csv parser.

it did not expect class column have label.

so if remove ,outlier part of first line (or first line completely), should read file fine.

i push change makes more robust here (it still lose label though, because elki has support column labels numerical columns not string label columns).

Search This Blog

Post

parsing - ELKI CSV parser problems -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

lua - nginx string.match non posix -

Maven Javadoc 'Cannot find default setter' and fails -