```
library(dplyr)
#> Warning: replacing previous import 'lifecycle::last_warnings' by
#> 'rlang::last_warnings' when loading 'pillar'
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(survival)
library(survParamSim)
set.seed(12345)
```

The goal of survParamSim is to perform survival simulation with parametric survival model generated from ‘survreg’ function in ‘survival’ package. In each simulation, coefficients are resampled from variance-covariance matrix of parameter estimates, in order to capture uncertainty in model parameters.

`survreg()`

functionBefore running a parametric survival simulation, you need to fit a model to your data using `survreg()`

function of `survival`

package.

In this vignette, we will be using `colon`

dataset available in `survival`

package, where the treatment effect of adjuvant Levamisole+5-FU for colon cancer over placebo is evaluated.

First, we load the data and do some data wrangling.

```
# ref for dataset https://vincentarelbundock.github.io/Rdatasets/doc/survival/colon.html
<-
colon2 as_tibble(colon) %>%
# recurrence only and not including Lev alone arm
filter(rx != "Lev",
== 1) %>%
etype # Same definition as Lin et al, 1994
mutate(rx = factor(rx, levels = c("Obs", "Lev+5FU")),
depth = as.numeric(extent <= 2))
```

Shown below are Kaplan-Meier curves for visually checking the data.

The second plot is looking at how many censoring we have over time.

Looks like we have a fairly uniform censoring between 1800 to 3000 days - this is attributable to a steady clinical study enrollment.

```
<- survfit(Surv(time, status) ~ rx, data = colon2)
survfit.colon ::ggsurvplot(survfit.colon) survminer
```

```
<- survfit(Surv(time, 1-status) ~ rx, data = colon2)
survfit.colon.censor ::ggsurvplot(survfit.colon.censor) survminer
```

Next we fit a lognormal parametric model for the data.

Here we are using `node4`

and `depth`

as additional covariates in addition to treatment (`rx`

).

You can see that all of the factor has strong association with the outcome.

```
<- survreg(Surv(time, status) ~ rx + node4 + depth,
fit.colon data = colon2, dist = "lognormal")
summary(fit.colon)
#>
#> Call:
#> survreg(formula = Surv(time, status) ~ rx + node4 + depth, data = colon2,
#> dist = "lognormal")
#> Value Std. Error z p
#> (Intercept) 7.5103 0.1343 55.92 < 2e-16
#> rxLev+5FU 0.7606 0.1677 4.54 5.7e-06
#> node4 -1.3474 0.1816 -7.42 1.2e-13
#> depth 1.1243 0.2661 4.22 2.4e-05
#> Log(scale) 0.6040 0.0461 13.10 < 2e-16
#>
#> Scale= 1.83
#>
#> Log Normal distribution
#> Loglik(model)= -2561.7 Loglik(intercept only)= -2607.6
#> Chisq= 91.8 on 3 degrees of freedom, p= 9e-20
#> Number of Newton-Raphson Iterations: 4
#> n= 619
```

`surv_param_sim()`

is the main function of the package that takes `survreg`

object as described above.

It also require you to supply `newdata`

, which is required even if it is not new - i.e. the same data was used for both `survreg()`

and `surv_param_sim()`

.

What it does is: 1. Re-sample all the coefficients in the parametric survival model from variance-covariance matrix for `n.rep`

times. 2. Perform survival time for all subjects in `newdata`

with the corresponding covariates, using one of the resampled coefficients. Also generate censoring time according to `censor.dur`

(if not NULL), and replace the simulated survival time above if censoring time is earlier. 4. Repeat the steps 2. for `n.rep`

times.

```
<-
sim surv_param_sim(object = fit.colon, newdata = colon2,
# Simulate censoring according to the plot above
censor.dur = c(1800, 3000),
# Simulate only 100 times to make the example go fast
n.rep = 100)
```

After the simulation is performed, you can either extract raw simulation results or further calculate Kaplan-Meier estimates or hazard ratio of treatment effect, as you see when you type `sim`

in the console.

```
sim#> ---- Simulated survival data with the following model ----
#> survreg(formula = Surv(time, status) ~ rx + node4 + depth, data = colon2,
#> dist = "lognormal")
#>
#> * Use `extract_sim()` function to extract individual simulated survivals
#> * Use `calc_km_pi()` function to get survival curves and median survival time
#> * Use `calc_hr_pi()` function to get hazard ratio
#>
#> * Settings:
#> #simulations: 100
#> #subjects: 619 (without NA in model variables)
```

To calculate survival curves for each simulated dataset, `calc_km_pi()`

can be used on the simulated object above.

```
<- calc_km_pi(sim, trt = "rx")
km.pi #> Warning in calc_km_pi(sim, trt = "rx"): 70 of 200 simulations (#rep * #trt
#> * #group) did not reach median survival time and these are not included for
#> prediction interval calculation. You may want to delay the `censor.dur` in
#> simulation.
km.pi#> ---- Simulated and observed (if calculated) survival curves ----
#> * Use `extract_medsurv_pi()` to extract prediction intervals of median survival times
#> * Use `extract_km_pi()` to extract prediction intervals of K-M curves
#> * Use `plot_km_pi()` to draw survival curves
#>
#> * Settings:
#> trt: rx
#> group: NA
#> pi.range: 0.95
#> calc.obs: TRUE
```

Similar to the raw simulated object, you can have a few options for further processing - one of them is plotting prediction intervals with `plot_km_pi()`

function.

```
plot_km_pi(km.pi) +
theme(legend.position = "bottom") +
labs(y = "Recurrence free rate") +
expand_limits(y = 0)
```

Or providing median survival summary table with `extract_medsurv_pi()`

function.

```
extract_medsurv_pi(km.pi)
#> # A tibble: 8 × 5
#> rx n description median quantile
#> <fct> <dbl> <chr> <dbl> <dbl>
#> 1 Obs 315 pi_low 1108. 0.0250
#> 2 Obs 315 pi_med 1538. 0.5
#> 3 Obs 315 pi_high 2113. 0.975
#> 4 Obs 315 obs 1236 NA
#> 5 Lev+5FU 304 pi_low 2126. 0.0250
#> 6 Lev+5FU 304 pi_med 2697. 0.5
#> 7 Lev+5FU 304 pi_high 2927. 0.975
#> 8 Lev+5FU 304 obs NA NA
```

Plot can also be made for subgroups.

You can see that prediction interval is wide for (depth: 1 & nodes4: 1) group, mainly due to small number of subjects

```
<- calc_km_pi(sim, trt = "rx", group = c("node4", "depth"))
km.pi #> Warning in calc_km_pi(sim, trt = "rx", group = c("node4", "depth")): 339 of 800
#> simulations (#rep * #trt * #group) did not reach median survival time and these
#> are not included for prediction interval calculation. You may want to delay the
#> `censor.dur` in simulation.
plot_km_pi(km.pi) +
theme(legend.position = "bottom") +
labs(y = "Recurrence free rate") +
expand_limits(y = 0)
```

To calculate prediction intervals of HRs, `calc_hr_pi()`

can be used on the simulated object above. Here I only generated subgroups based on “depth”, since the very small N in (depth: 1 & nodes4: 1) can cause issue with calculating HRs.

```
<- calc_hr_pi(sim, trt = "rx", group = c("depth"))
hr.pi
hr.pi#> ---- Simulated and observed (if calculated) hazard ratio ----
#> * Use `extract_hr_pi()` to extract prediction intervals and observed HR
#> * Use `extract_hr()` to extract individual simulated HRs
#> * Use `plot_hr_pi()` to draw histogram of predicted HR
#>
#> * Settings:
#> trt: rx
#> ('Lev+5FU' as test trt, 'Obs' as control)
#> group: depth
#> pi.range: 0.95
#> calc.obs: TRUE
plot_hr_pi(hr.pi)
```

You can also extract prediction intervals and observed HR with `extract_hr_pi()`

function.

```
extract_hr_pi(hr.pi)
#> # A tibble: 8 × 5
#> depth rx description HR quantile
#> <dbl> <fct> <chr> <dbl> <dbl>
#> 1 0 Lev+5FU pi_low 0.464 0.0250
#> 2 0 Lev+5FU pi_med 0.624 0.5
#> 3 0 Lev+5FU pi_high 0.794 0.975
#> 4 0 Lev+5FU obs 0.590 NA
#> 5 1 Lev+5FU pi_low 0.233 0.0250
#> 6 1 Lev+5FU pi_med 0.597 0.5
#> 7 1 Lev+5FU pi_high 1.13 0.975
#> 8 1 Lev+5FU obs 0.607 NA
```