Getting Started with CDSim

The CDSim package provides an easy workflow for simulating climate data such as temperature and rainfall across multiple synthetic weather stations. It is useful for testing models, teaching climate analysis, generating demo data, and creating datasets with controlled variability.

This vignette demonstrates:

Creating Weather Stations

The stations can be created either by:

library(CDSim)

We begin by generating a set of synthetic weather stations. The seed ensures reproducibility.

stations <- create_stations(n = 3, seed = 123)
#> Generating synthetic station network...
#> Generated 3 synthetic stations within bounding box.
stations
#>     Station        LON       LAT
#> 1 Station_1 -2.0621124 10.681122
#> 2 Station_2  0.4415257 11.083271
#> 3 Station_3 -1.4551154  4.818895

Each station typically contains:

  1. station name

  2. longitude

  3. latitude

Simulating Climate Data

Once the stations are created, we can generate daily or monthly climate time series using built-in stochastic models.

sim <- simulate_climate_series(stations, start_year = 2019, end_year = 2024)
head(sim)
#>     Station       LON      LAT Year Month       Date Avg.Tn Avg.Tx Sum.Rf
#> 1 Station_1 -2.062112 10.68112 2019     1 2019-01-15   19.5   31.8    0.0
#> 2 Station_1 -2.062112 10.68112 2019     2 2019-02-15   21.7   34.7    0.0
#> 3 Station_1 -2.062112 10.68112 2019     3 2019-03-15   23.9   36.0    0.0
#> 4 Station_1 -2.062112 10.68112 2019     4 2019-04-15   24.6   32.6   51.4
#> 5 Station_1 -2.062112 10.68112 2019     5 2019-05-15   22.6   33.3    0.0
#> 6 Station_1 -2.062112 10.68112 2019     6 2019-06-15   20.7   29.2   49.5
#>   Avg.Rf
#> 1   0.00
#> 2   0.00
#> 3   0.00
#> 4   1.71
#> 5   0.00
#> 6   1.65

A typical simulated record includes:

Exporting Data to CSV and NetCDF

CDSim includes convenient exporters such as CSV and NetCDF for storing climate data. This makes it easier for packages such as ncdf4, terra, or stars to read the outputs.

write_station_csv(sim, file = "climate_data.csv")
write_station_netcdf(sim, out_nc = "climate_data.nc")

Quick Visualization

To demonstrate a quick plot, here’s the maximum temperature series of the first station.

plot_station_timeseries(sim,'Station_1', var = "Avg.Tx")
#> `geom_smooth()` using formula = 'y ~ x'

Validation of Simulated Climate Data

To assess the physical realism and statistical consistency of the simulated climate data, CDSim provides a validation framework.

The function validate_climate_internal() performs a series of diagnostic checks on the simulated dataset, including:

validation <- validate_climate_internal(sim)
validation
#> $summary
#>    Station               LON               LAT              Year     
#>  Length:216         Min.   :-2.0621   Min.   : 4.819   Min.   :2019  
#>  Class :character   1st Qu.:-2.0621   1st Qu.: 4.819   1st Qu.:2020  
#>  Mode  :character   Median :-1.4551   Median :10.681   Median :2022  
#>                     Mean   :-1.0252   Mean   : 8.861   Mean   :2022  
#>                     3rd Qu.: 0.4415   3rd Qu.:11.083   3rd Qu.:2023  
#>                     Max.   : 0.4415   Max.   :11.083   Max.   :2024  
#>      Month            Date                Avg.Tn          Avg.Tx     
#>  Min.   : 1.00   Min.   :2019-01-15   Min.   :18.00   Min.   :24.00  
#>  1st Qu.: 3.75   1st Qu.:2020-07-07   1st Qu.:18.60   1st Qu.:26.60  
#>  Median : 6.50   Median :2021-12-30   Median :19.80   Median :30.30  
#>  Mean   : 6.50   Mean   :2021-12-29   Mean   :20.57   Mean   :30.04  
#>  3rd Qu.: 9.25   3rd Qu.:2023-06-22   3rd Qu.:22.23   3rd Qu.:32.90  
#>  Max.   :12.00   Max.   :2024-12-15   Max.   :25.90   Max.   :40.10  
#>      Sum.Rf           Avg.Rf     
#>  Min.   :  0.00   Min.   :0.000  
#>  1st Qu.:  0.00   1st Qu.:0.000  
#>  Median : 26.45   Median :0.850  
#>  Mean   : 39.63   Mean   :1.299  
#>  3rd Qu.: 62.33   3rd Qu.:2.120  
#>  Max.   :201.30   Max.   :6.500  
#> 
#> $checks
#> $checks$Tmin_min
#> [1] 18
#> 
#> $checks$Tmax_max
#> [1] 40.1
#> 
#> $checks$Rain_min
#> [1] 0
#> 
#> $checks$Tmax_gt_Tmin
#> [1] TRUE
#> 
#> $checks$Tmin_plausible
#> [1] TRUE
#> 
#> 
#> $distribution
#> $distribution$rain_skewness
#> [1] 1.281271
#> 
#> $distribution$Tmax_skewness
#> [1] 0.1560093
#> 
#> $distribution$Tmin_skewness
#> [1] 0.6707342
#> 
#> $distribution$sd_Tmax
#> [1] 4.049788
#> 
#> $distribution$sd_Tmin
#> [1] 2.260367
#> 
#> $distribution$sd_Rain
#> [1] 47.04732
#> 
#> 
#> $correlation
#>            Sum.Rf     Avg.Tx     Avg.Tn
#> Sum.Rf  1.0000000 -0.4488468 -0.2087886
#> Avg.Tx -0.4488468  1.0000000  0.7185227
#> Avg.Tn -0.2087886  0.7185227  1.0000000
#> 
#> $rain_temp_coupling
#> [1] TRUE
#> 
#> $autocorrelation
#> $autocorrelation$Tmax
#> [1] 0.513696
#> 
#> $autocorrelation$Tmin
#> [1] 0.7104567
#> 
#> $autocorrelation$Rain
#> [1] 0.109139
#> 
#> 
#> $trend
#> $trend$Tmax_slope
#>   time_index 
#> -0.003346432 
#> 
#> $trend$Rain_slope
#> time_index 
#> 0.08788007 
#> 
#> 
#> $seasonality
#> $seasonality$peak_month
#> 8 
#> 8 
#> 
#> $seasonality$trough_month
#> 5 
#> 5 
#> 
#> 
#> $valid
#> [1] TRUE

In addition, CDSim supports external validation against observed datasets using the function validate_climate(). This allows users to compare simulated data with real-world observations for further evaluation of model performance.

The validation framework is particularly useful in scenarios where observed data are unavailable, providing diagnostic assurance that the simulated outputs adhere to known climatological principles.