This document outlines the internal workflow of heatwaveR.
flowchart TD
A(["ts2clm()"]) --> B[/INPUT:<br>date, temperature/]
B --> C{Checks?}
C --> |Yes| D[/OUTPUT:<br>date, temperature/]
C --> |No| E([End])
D --> F(["make_whole_fast()"])
G(["make_whole_fast()"]) --> H[/INPUT:<br>date, temperature/]
H --> I[/OUTPUT:<br>doy, date, temperature/]
I --> J(["clim_spread()"])
K(["clim_spread()"]) --> L[/INPUT:<br>doy, date, temperature/]
L ---> M[spread doy as rows<br>spread year as cols<br>grow doy by windowHalfWidth]
M --> N[/OUTPUT:<br>temperature in<br>doy x year matrix/]
N --> O(["clim_calc_cpp()"])
P(["clim_calc_cpp()"]) --> Q[/INPUT:<br>temperature in<br>doy x year matrix/]
Q -->
ts2clm()
ts2clm()
accepts a dataframe with date
(x = t
) and temperature (y = temp
). Additional
function arguments include:
climatologyPeriod
Required. To this argument should be
passed two values (see example below). The first value should be the
chosen date for the start of the climatology period, and the second
value the end date of said period. This chosen period (preferably 30
years in length) is then used to calculate the seasonal cycle and the
extreme value threshold.maxPadLength
Specifies the maximum length of days over
which to interpolate (pad) missing data (specified as NA
)
in the input temperature time series; i.e., any consecutive blocks of
NA
s with length greater than maxPadLength
will
be left as NA
. The default is FALSE
. Set as an
integer to interpolate. Setting maxPadLength
to
TRUE
will return an error.windowHalfWidth
Width of sliding window about
day-of-year (to one side of the center day-of-year) used for the pooling
of values and calculation of climatology and threshold percentile.
Default is 5
days, which gives a window width of 11 days
centred on the 6th day of the series of 11 days.pctile
Threshold percentile (%) for detection of events
(MHWs). Default is 90
th percentile. Should the intent be to
use these threshold data for MCSs, set pctile = 10
. Or some
other low value.smoothPercentile
Boolean switch selecting whether to
smooth the climatology and threshold percentile time series with a
moving average of smoothPercentileWidth
. Default is
TRUE
.clmOnly
Choose to calculate and return only the
climatologies. The default is FALSE
.var
This argument has been introduced to allow the user
to choose if the variance of the seasonal signal per doy
should be calculated. The default of FALSE
will prevent the
calculation, potentially increasing speed of calculations on gridded
data and reducing the size of the output. The variance was initially
introduced as part of the standard output from Hobday et al. (2016), but
few researchers use it and so it is generally regarded now as
unnecessary.roundClm
This argument allows the user to choose how
many decimal places the seas
and thresh
outputs will be rounded to. Default is 4. To prevent rounding set
roundClm = FALSE
. This argument may only be given numeric
values or FALSE.The function first checks for a climatologyPeriod
consisting of a vectors of two dates
(e.g. c("1982-01-01", 2011-12-31)
) 1. It is advised that it
must be least 30 years, but it can handle shorter durations.
Currently it does not weigh an unequal number of dates per year
in cases when the duration of each year is not exactly 365 (or 366)
days. Weighting for unequal number of days per year in
situations where the climatologyPerdiod
comprises parts of
years must be addressed in an update.
This function supports leap years. Currently this is done by ignoring Feb 29s for the initial calculation of the climatology and threshold. The values for Feb 29 are then linearly interpolated from the values for Feb 28 and Mar 1. In an update I’d suggest using the temperature data for Feb 29 and not interpolating them.
Should the user be concerned about repeated measurements per day,
we suggest that the necessary checks and fixes are implemented prior to
feeding the time series to ts2clm()
.
Much of the interval function depends on
data.table because it is fast. I suggest
removing this dependence in favour of C++ code. Also,
except for output of flat tables as tibble
s, do not
rely on the Tidyverse.
The function will return a tibble
(see the
tidyverse package) with the input time series and the
newly calculated climatology. The climatology contains the daily
climatology and the threshold for calculating MHWs. The software was
designed for creating climatologies of daily temperatures, and the units
specified below reflect that intended purpose. However, various other
kinds of climatologies may be created, and if that is the case, the
appropriate units need to be determined by the user.
Value | Description |
---|---|
doy |
Julian day (day-of-year). For non-leap years it runs 1…59 and 61…366, while leap years run 1…366. |
t |
The date vector in the original time series supplied in
data . If an alternate column was provided to the
x argument, that name will rather be used for this
column. |
temp |
The measurement vector as per the the original data
supplied to the function. If a different column was given to the
y argument that will be shown here. |
seas |
Climatological seasonal cycle \[deg. C\]. |
thresh |
Seasonally varying threshold (e.g., 90th percentile) \[deg. C\]. This is used in
detect_event for the detection/calculation of events
(MHWs). |
var |
Seasonally varying variance (standard deviation) \[deg. C\]. This column is not returned if
var = FALSE (default). |
Should clmOnly
be enabled, only the 365 or 366 day
climatology will be returned.
make_whole_fast()
This function constructs a continuous, uninterrupted time series of
temperatures. It takes a series of dates and temperatures, and if
irregular (but ordered), inserts missing dates and fills corresponding
temperatures with NA
s. It has only one argument and is fed
data in a consistent format by early steps in ts2clm()
:
data
A data frame with columns for date
(ts_x
) and temperature (ts_y
) data. Ordered
daily data are expected, and although missing values (NA) can be
accommodated, the function is only recommended when NA
s
occur infrequently, preferably at no more than three consecutive
days.Date
(e.g. “1982-01-01”).maxPadLength
argument of the ts2clm
function. The longer and more
frequent the gaps become the lower the fidelity of the annual
climatology and threshold that can be calculated, which will not only
have repercussions for the accuracy at which the event metrics can be
determined, but also for the number of events that can be detected.
Currently there is no check for the number of NA
s
in the time series provided to ts2clm()
and this can be
added to future updates such that it fails (or sends a loud warning)
when a threshold of maximum allowable NA
s is
exceeded.doy
)
vector in and insert rows in cases when the original data set has
missing rows for some dates. Should the user be concerned about the
potential for repeated measurements or worry that the time series is
unordered, we suggest that the necessary checks and fixes are
implemented prior to feeding the time series to ts2clim()
via make_whole_fast()
. When using the fast algorithm, we
assume that the user has done all the necessary work to ensure that the
time vector is ordered and without repeated measurements
beforehand.The function will return a data frame with three columns. The column
headed doy
(day-of-year) is the Julian day running from 1
to 366, but modified so that the day-of-year series for non-leap-years
runs 1…59 and then 61…366. For leap years the 60th day is February 29.
The ts_x
column is a series of dates of class
Date
, while y
is the measured variable. This
time series will be uninterrupted and continuous daily values between
the first and last dates of the input data.