How to Compare Information in csv file with Python? -
i'm working on csv file. have different columns, each corresponding information of dataset. suppose file contain each line:
- name information1 information2 information3
-for lines having same name , information1 , 2 have compute mean inf3
this piece of code stopped:
col_a=[row[1] row in file] in col_a: currentrow=col_a[1] nextrow=col_a[2] in range(0,len(col_a)): if (currentrow)==set(nextrow):???
i started months ago programme, please understand difficulties.
i still struggling understand needed, here goes. following script first blocks groups of rows first column matches, e.g. 3 "aaa" rows.
for each block locates rows have matching columns of interest. if 2 or more found, average calculated on them.
import collections file = [ ["aaa", "3", "x", "g", "b", 4], ["aaa", "4", "e", "r", "t", 3], ["aaa", "3", "x", "g", "b", 7], ["vv1", "5", "w", "a", "s", 42], ["vv2", "5", "w", "a", "s", 10], ["vvv", "5", "w", "a", "s", 1], ["vvv", "5", "w", "a", "s", 4], ["vvv", "5", "w", "a", "s", 3]] def calculate_stats(block): d = collections.defaultdict(list) # build dictionary of rows matching columns cols in block: key = (cols[2], cols[3]) # columns match e.g. "x" , "g" d[key].append(cols) key, rows in d.items(): if len(rows) > 1: col_f = [cols[5] cols in rows] # calculate mean on col f print "matched: ", rows, "mean: ", sum(col_f) / float(len(col_f)) last_row = file[0] block = [] cols in file: if cols[0] == last_row[0]: block.append(cols) elif len(block) > 1: calculate_stats(block) block = [cols] else: block = [] last_row = cols # deal remainder if len(block) > 1: calculate_stats(block)
for sample data have used, following results displayed:
matched: [['aaa', '3', 'x', 'g', 'b', 4], ['aaa', '3', 'x', 'g', 'b', 7]] mean: 5.5 matched: [['vvv', '5', 'w', 'a', 's', 4], ['vvv', '5', 'w', 'a', 's', 3]] mean: 3.5
Comments
Post a Comment