r - How to prevent data.table to force numeric variables into character variables without manually specifying these? -
consider following dataset:
dt <- structure(list(lllocatie = structure(c(1l, 6l, 2l, 4l, 3l), .label = c("assen", "oosterwijtwerd", "startenhuizen", "t-zandt", "tjuchem", "winneweer"), class = "factor"), lat = c(52.992, 53.32, 53.336, 53.363, 53.368), lon = c(6.548, 6.74, 6.808, 6.765, 6.675), mag.cat = c(3l, 2l, 1l, 2l, 2l), places = structure(c(2l, 4l, 5l, 6l, 3l), .label = c("", "amen,assen,deurze,ekehaar,eleveld,geelbroek,taarlo,ubbena", "eppenhuizen,garsthuizen,huizinge,kantens,middelstum,oldenzijl,rottum,startenhuizen,toornwerd,westeremden,zandeweer", "loppersum,winneweer", "oosterwijtwerd", "t-zandt,zeerijp"), class = "factor")), .names = c("lllocatie", "lat", "lon", "mag.cat", "places"), class = c("data.table", "data.frame"), row.names = c(na, -5l))
when want split strings in last column separate rows, use (with data.table
version 1.9.5+):
dt.new <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=list(lllocatie,lat,lon,mag.cat)]
however, when use:
dt.new2 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=lllocatie]
i the same result except columns forced character variables. problem small datasets not big problem specify variables not have split in by
argument, datasets many columns/variables is. know possible splitstackshape
package (as mentioned @colonelbeauvel in answer), i'm looking data.table
solution want chain more operations this.
how can prevent without manually specifying variables not have split in by
argument?
two solutions data.table
:
1: use type.convert=true
argument inside tstrsplit()
proposed @arun:
dt.new1 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true, type.convert=true))), by=lllocatie]
2: use setdiff(names(dt),"places")
in by
argument proposed @frank:
dt.new2 <- dt[, lapply(.sd, function(x) unlist(tstrsplit(x, ",", fixed=true))), by=setdiff(names(dt),"places")]
both approaches give same result:
> identical(dt.new1,dt.new2) [1] true
the advantage of second solution when have more thanone columns string values, 1 specify in setdiff(names(dt),"places")
being split (supposing want specific one, in case places
, split). splitstackshape
package offers advantage.
Comments
Post a Comment