r - How to prevent data.table to force numeric variables into character variables without manually specifying these? -

- September 15, 2015

consider following dataset:

dt <- structure(list(lllocatie = structure(c(1l, 6l, 2l, 4l, 3l), .label = c("assen", "oosterwijtwerd", "startenhuizen", "t-zandt", "tjuchem", "winneweer"), class = "factor"),                   lat = c(52.992, 53.32, 53.336, 53.363, 53.368),                   lon = c(6.548, 6.74, 6.808, 6.765, 6.675),                   mag.cat = c(3l, 2l, 1l, 2l, 2l),                   places = structure(c(2l, 4l, 5l, 6l, 3l), .label = c("", "amen,assen,deurze,ekehaar,eleveld,geelbroek,taarlo,ubbena", "eppenhuizen,garsthuizen,huizinge,kantens,middelstum,oldenzijl,rottum,startenhuizen,toornwerd,westeremden,zandeweer", "loppersum,winneweer", "oosterwijtwerd", "t-zandt,zeerijp"), class = "factor")),             .names = c("lllocatie", "lat", "lon", "mag.cat", "places"),              class = c("data.table", "data.frame"),              row.names = c(na, -5l))

when want split strings in last column separate rows, use (with data.table version 1.9.5+):

dt.new <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=list(lllocatie,lat,lon,mag.cat)]

however, when use:

dt.new2 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=lllocatie]

i the same result except columns forced character variables. problem small datasets not big problem specify variables not have split in by argument, datasets many columns/variables is. know possible splitstackshape package (as mentioned @colonelbeauvel in answer), i'm looking data.table solution want chain more operations this.

how can prevent without manually specifying variables not have split in by argument?

two solutions data.table:

1: use type.convert=true argument inside tstrsplit() proposed @arun:

dt.new1 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true, type.convert=true))), by=lllocatie]

2: use setdiff(names(dt),"places") in by argument proposed @frank:

dt.new2 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=setdiff(names(dt),"places")]

both approaches give same result:

> identical(dt.new1,dt.new2) [1] true

the advantage of second solution when have more thanone columns string values, 1 specify in setdiff(names(dt),"places") being split (supposing want specific one, in case places, split). splitstackshape package offers advantage.

Search This Blog

Post

r - How to prevent data.table to force numeric variables into character variables without manually specifying these? -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

Maven Javadoc 'Cannot find default setter' and fails -