python - Changing dictionary consisting 16k dicts to a Pandas Dataframe -
i'm working on data mining problem master thesis. i'm using python data analysis, have no experience pandas, needed convert data dataframe. in order survival regression python package called lifelines need create covariate matrix experiment_data dict containing on 16k of dicts twitter data kickstarter projects (see example dict below).
16041: {'goal': 1200, 'launch': 1353544772, 'days-before-deadline': 3, 'followers': 149, 'date-funded': 1355887690.9189188, 'id': 52687, 'tweet_ids': [280965208409796608, ... n], 'state': 1, 'deadline': 1356136772, 'retweets': 0, 'favorites': 0, 'duration': 31, 'timestamps': [1355876412.0], 'favourites': 0, 'runtime': 27, 'friends': 127, 'pledges': [0.0, 0.0625, 0.0625, ... n], 'statuses': 7460}
if create pandas dataframe dict, i'll able create covariate matrix using patsy, example this:
x = patsy.dmatrix('friends + followers + retweets, favorites -1', data, return_type='dataframe')
now question how create pandas dataframe experiment_data dicts? keys of inner dictionaries (goal, launch, followers, etc.) should columns each kickstarter project (i.e. index nr.: 0 16041).
any appreciated. in advance!
p.s. if have experience in survival regression using python , lifelines, please let me know!
i think want from_dict
using param orient='index'
:
in [31]: d={16041: {'goal': 1200, 'launch': 1353544772, 'days-before-deadline': 3, 'followers': 149, 'date-funded': 1355887690.9189188, 'id': 52687, 'tweet_ids': [280965208409796608], 'state': 1, 'deadline': 1356136772, 'retweets': 0, 'favorites': 0, 'duration': 31, 'timestamps': [1355876412.0], 'favourites': 0, 'runtime': 27, 'friends': 127, 'pledges': [0.0, 0.0625, 0.0625], 'statuses': 7460}} pd.dataframe.from_dict(d, orient='index') out[31]: id followers days-before-deadline statuses duration state \ 16041 52687 149 3 7460 31 1 goal tweet_ids pledges favourites \ 16041 1200 [280965208409796608] [0.0, 0.0625, 0.0625] 0 deadline favorites retweets runtime friends launch \ 16041 1356136772 0 0 27 127 1353544772 timestamps date-funded 16041 [1355876412.0] 1.355888e+09
Comments
Post a Comment