Motivation

The problem arises when we have to use time-series in a data.frame or use time-series operations like lag and diff for numeric vectors in a data.frame. Let’s look into an example.

library(transx)

First, wee restrict tibble printing options to minimize the space occupied.

library(dplyr)
options(tibble.print_min = 3)

Let’s load the economics dataset from ggplot2 for illustration.

econ <- ggplot2::economics
econ
#> # A tibble: 574 x 6
#>   date         pce    pop psavert uempmed unemploy
#>   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#> 1 1967-07-01  507. 198712    12.6     4.5     2944
#> 2 1967-08-01  510. 198911    12.6     4.7     2945
#> 3 1967-09-01  516. 199113    11.9     4.6     2958
#> # ... with 571 more rows

Then, we are going to use some stats functions:

mutate(econ, pop_lag = stats::lag(as.ts(pop)))
#> # A tibble: 574 x 7
#>   date         pce    pop psavert uempmed unemploy pop_lag
#>   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>   <dbl>
#> 1 1967-07-01  507. 198712    12.6     4.5     2944  198712
#> 2 1967-08-01  510. 198911    12.6     4.7     2945  198911
#> 3 1967-09-01  516. 199113    11.9     4.6     2958  199113
#> # ... with 571 more rows

base::lag only works on ts objects. However, dplyr has thought about this problem

mutate(econ, pop_lag = dplyr::lag(pop))
#> # A tibble: 574 x 7
#>   date         pce    pop psavert uempmed unemploy pop_lag
#>   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>   <dbl>
#> 1 1967-07-01  507. 198712    12.6     4.5     2944      NA
#> 2 1967-08-01  510. 198911    12.6     4.7     2945  198712
#> 3 1967-09-01  516. 199113    11.9     4.6     2958  198911
#> # ... with 571 more rows

However, this problem extends to all the univariate functions that are applied in the same manner in a data.frame. For example

mutate(econ, pop_diff = base::diff(pop))
#> Error: Problem with `mutate()` input `pop_diff`.
#> x Input `pop_diff` can't be recycled to size 574.
#> i Input `pop_diff` is `base::diff(pop)`.
#> i Input `pop_diff` must be size 574 or 1, not 573.

The idea for transx is coming from the need to construct wrapper functions.

diffx <- function(x, ...) x - dplyr::lag(x, ... )

mutate(econ, pop_diff = diffx(pop))
#> # A tibble: 574 x 7
#>   date         pce    pop psavert uempmed unemploy pop_diff
#>   <date>     <dbl>  <dbl>   <dbl>   <dbl>    <dbl>    <dbl>
#> 1 1967-07-01  507. 198712    12.6     4.5     2944       NA
#> 2 1967-08-01  510. 198911    12.6     4.7     2945      199
#> 3 1967-09-01  516. 199113    11.9     4.6     2958      202
#> # ... with 571 more rows