python - How to sort numerically but read_csv with dtype=object? -
given simplified test.csv file:
wrong 8 7 6 3 1 2 4 5 9 10
and code:
#!/usr/bin/python import pandas pd data = pd.read_csv('test.csv', dtype=object) counts=data['wrong'].value_counts(dropna=false) counts_converted=counts.convert_objects(convert_numeric=true) print counts_converted.sort_index()
produces following output:
1 1 10 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 dtype: int64
why last print statement not sort index 1-10?
i have force dtype object when csv file read overcome issues detecting mixed character, date, , numeric formats in columns, removing statement isn't going work me.
i thought convert series numeric, doesn't seem work.
editing question since commenting not allowing me use enter key without posting comment... [ahh, found many long rants feature. shift-enter works.]
@edchum suggested solution works simplified case, not work production data. consider less simple data file:
wrong,right 8,a 7,b 6,c 3,d 1, 2,f 4,g 5,h 9,i 10,j ,k 11,l
the empty value on second last line causes error "cannot convert float nan integer."
i have many nans (all empty) need kept , counted in value_counts.
other empty cells seem turn large negative numbers (i.e. -5226413792388707240) upon casting int64.
apologies in advance obtuseness on part! help.
adding astype
after reading makes sort properly.
you mention have sort out mixed characters , dates , stuff, before astype
, should fine.
import pandas pd data = pd.read_csv('/home/mikael/test.csv', dtype=object) # sanitize data here data['wrong'] = data['wrong'].astype(int) counts=data['wrong'].value_counts(dropna=false) counts_converted=counts.convert_objects(convert_numeric=true) print counts_converted.sort_index() 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 dtype: int64
Comments
Post a Comment