optimization - Compute new column based on values in current and following rows with dplyr in R -
i have big dataset (10+ mil x 30 vars) , trying compute new variables based on complicated interactions of current ones. clarity including important variables in question. have following code in r
interested in other views , opinions. using dplyr
package compute new columns based on current/following row values of 3 other columns. (more explanation below code)
i wondering if there way make faster , more efficient, or maybe rewrite it...
# main function-data dataframe, windowsize , ratio ints computenewcolumn <- function(data,windowsize,ratio){ #helper function used in second mutate down... # args ints, return boolean out windowahead <- function(timeto,window,reduction){ # subset original dataframe-only observations values of # timetogo between timeto-1 , window (basically following x rows # current one) subframe <- data[(timeto-1 >= data$timetogo & data$timetogo >= window), ] isthere <- any(subframe$price < reduction) return(isthere) } # group value of id first , order timetogo... data %<>% group_by(id) %>% arrange(desc(timetogo)) %>% # ...create 2 new columns simple interactions of existing ones... mutate(window = ifelse(timetogo > windowsize, timetogo - windowsize, 0), reduction = floor(price - (ratio * price))) %>% rowwise() %>% #...now comes more complex stuff- want compute third column # depending on next (timetogo - window) number of values of price mutate(advice = ifelse(windowahead(timetogo,window,reduction),1,0) ) return(data) }
we have dataset following columns: id,price, timetogo.
we first group values of id , compute 2 new columns based on current row values (window timetogo , reduction price). next thing compute new third column based on
1.current value of reduction
2.the next (window - timetogo) amount of values of price in dataframe.
i wondering if there simple way reference upcoming values of column within mutate()
? ideally looking sliding window function on 1 column, limits of sliding window set 2 other current column values. solution uses custom function subsets on original dataframe manually, comparison , returns value mutate()
call. , ideas appreciated!
p.s. heres sample of data... please let me know if need more info. thanks!
> id timetogo price 1 aqsafoto30a 96 19 2 aqsafoto20a 95 19 3 aqsafoto30a 94 17 4 aqsafoto20a 93 18 5 aqsafoto25a 92 19 6 aqsafoto30a 91 17
Comments
Post a Comment