optimization - Compute new column based on values in current and following rows with dplyr in R -


i have big dataset (10+ mil x 30 vars) , trying compute new variables based on complicated interactions of current ones. clarity including important variables in question. have following code in r interested in other views , opinions. using dplyr package compute new columns based on current/following row values of 3 other columns. (more explanation below code)

i wondering if there way make faster , more efficient, or maybe rewrite it...

# main function-data dataframe, windowsize , ratio ints computenewcolumn <- function(data,windowsize,ratio){       #helper function used in second mutate down...      # args ints, return boolean out       windowahead <- function(timeto,window,reduction){       # subset original dataframe-only observations values of      # timetogo between timeto-1 , window (basically following x rows       # current one)      subframe <- data[(timeto-1 >= data$timetogo & data$timetogo >= window), ]      isthere <- any(subframe$price < reduction)      return(isthere)      }    # group value of id first , order timetogo...     data %<>% group_by(id) %>%    arrange(desc(timetogo)) %>%    # ...create 2 new columns simple interactions of existing ones...   mutate(window = ifelse(timetogo > windowsize, timetogo - windowsize, 0),          reduction = floor(price - (ratio * price))) %>%    rowwise() %>%    #...now comes more complex stuff- want compute third column    # depending on next (timetogo - window) number of values of price   mutate(advice = ifelse(windowahead(timetogo,window,reduction),1,0) )   return(data) } 

we have dataset following columns: id,price, timetogo.

we first group values of id , compute 2 new columns based on current row values (window timetogo , reduction price). next thing compute new third column based on

1.current value of reduction

2.the next (window - timetogo) amount of values of price in dataframe.

i wondering if there simple way reference upcoming values of column within mutate()? ideally looking sliding window function on 1 column, limits of sliding window set 2 other current column values. solution uses custom function subsets on original dataframe manually, comparison , returns value mutate() call. , ideas appreciated!

p.s. heres sample of data... please let me know if need more info. thanks!

>            id timetogo price 1 aqsafoto30a       96    19 2 aqsafoto20a       95    19 3 aqsafoto30a       94    17 4 aqsafoto20a       93    18 5 aqsafoto25a       92    19 6 aqsafoto30a       91    17 


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -