dataframe - R replacing columns by lookup to dictionary -


in question need able lookup value dataframe's column not based on 1 attribute, based on more attributes , range comparing against dictionary. (yes, continuation of story in r conditional replace more columns lookup )

it should easy question r-known ppl, because provide working solution basic indexing, needs upgraded, possibly ... hard me, because iam in process of learning r.

from start:

when want replace missing values columns testcolnames (big) table df1 according column default of (small) dictionary testdefs (row selected making testdefs$labmet_id equal column name testcolnames), use code:

testcolnames=c("80","116") #...result of regexp on colnames(df1), longer  df1[,testcolnames] <- lapply(testcolnames, function(x) { tmpcol<-df1[,x];   tmpcol[is.na(tmpcol)] <- testdefs$default[match(x, testdefs$labmet_id)];   tmpcol  })  

to go:

now - need upgrade solution. table testdefs have (example below) multiple rows of same labmet_id differing new 2 columns called lower , upper ... need bounds variable df1$rngvalue when selecting value replace.

in words - upgrade solution not select row testdefs (where testdefs$labmet_id equals column name), select these rows such row, df1$rngvalue in bounds of testdefs$lower , testdefs$upper (if none such exists, take range closest - either lowest or highest, if dictionary doesnt have labmet_id, can leave na in original data).

an example:

testdefs

"labmet_id","lower","upper","default","notuse","notuse2" 30,0,54750,25,80,2            #..."many columns dont care about" 46,0,54750,1.45,3.5,0.2 80,0,54750,0.03,0.1,0.01 116,0,30,0.09,0.5,0.01 116,31,365,0.135,0.7,0.01 116,366,5475,0.11,0.7,0.01 116,5476,54750,0.105,0.7,0.02 

df1:

"rngvalue","80","116" 36,na,na 600000,na,na 367,5,na 90,na,6 

to transformed into:

"rngvalue","80","116" 36,0.03,0.135                   #col80 replaced 0.03 600000,0.03,0.105               #col116 needs decided on range, value bigger in dictionary take last 1 367,5,0.11                      #5 not replaced, second column nicely looks 0.11 90,0.03,6                       #6 not replaced 

since intervals don't have gaps, can use findinterval. change lookup table list containing break points , defaults each value using dlply plyr.

## transform lookup table list breaks intervals library(plyr) lookup <- dlply(testdefs, .(labmet_id), function(x)     list(breaks=c(rbind(x$lower, x$upper), x$upper[length(x$upper)])[c(t,f)],          default=x$default)) 

so, lookups like

lookup[["116"]] # $breaks # [1]     0    31   366  5476 54750 #  # $default # [1] 0.090 0.135 0.110 0.105 

then, can lookup following

testcolnames=c("80","116")  df1[,testcolnames] <- lapply(testcolnames, function(x) {     tmpcol <- df1[,x]     defaults <- with(lookup[[x]], {         default[pmax(pmin(length(breaks)-1, findinterval(df1$rngvalue, breaks)), 1)]     })     tmpcol[is.na(tmpcol)] <- defaults[is.na(tmpcol)]     tmpcol })  #   rngvalue   80   116 # 1       36 0.03 0.135 # 2   600000 0.03 0.105 # 3      367 5.00 0.110 # 4       90 0.03 6.000 

the findinterval returns values below , above number of breaks if rngvalue outside of range. reason pmin , pmax in code above.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -