Introduction to whattheflux

The whattheflux package provides functions to parse static-chamber greenhouse gas measurement files generated by a variety of instruments; compute flux rates using multi-observation metadata; and generate diagnostic metrics and plots. It’s designed to be easy to integrate into scientific workflows.

Load sample data

library(whattheflux)

# Data from a LI-7810
f <- system.file("extdata/TG10-01087.data", package = "whattheflux")
dat <- wtf_read_LI7810(f)
#> TG10-01087.data: read 507 rows of TG10-01087 data, 2022-10-27 10:35:42 to 2022-10-27 10:44:08 EST

# Note that the whattheflux read functions print some info after reading
# Set "options(whattheflux.quiet = TRUE)" to suppress such messages

# Look at a subset of the data; the full data frame has 500+ rows and 25 columns
dat[1:6, 1:9]
#>   DATAH    SECONDS NANOSECONDS  NDX DIAG REMARK      H2O      CO2      CH4
#> 1  DATA 1666884942   313442945 4509    0     NA 12500.35 458.8612 2068.000
#> 2  DATA 1666884943   313442945 4513    0     NA 12449.87 458.1066 2069.830
#> 3  DATA 1666884944   313442945 4517    0     NA 12418.81 458.7320 2071.540
#> 4  DATA 1666884945   313442945 4521    0     NA 12429.65 458.8037 2071.960
#> 5  DATA 1666884946   313442945 4525    0     NA 12439.89 458.2824 2069.660
#> 6  DATA 1666884947   313442945 4529    0     NA 12429.37 456.9340 2066.544

The data frame returned by wtf_read_LI7810 is all data from the raw LI-7810 file, except that TIMESTAMP, TZ (time zone of the timestamps), SN (serial number), and MODEL columns have been added.

The analyzer data is basically a stream of measured greenhouse gas concentrations:

library(ggplot2)
ggplot(dat, aes(TIMESTAMP, CO2)) + geom_point()

Match with metadata

For these data to be useful, we need to associate them with metadata about the measurements: when they were started, how long they lasted, plot/treatment/collar information, etc.

# Accompanying metadata
md <- system.file("extdata/TG10-01087-metadata.csv", package = "whattheflux")
metadat <- read.csv(md)

print(metadat)
#>         Date Start_time Plot Obs_length
#> 1 2022-10-27   10:35:30    A         60
#> 2 2022-10-27   10:37:15    B         60
#> 3 2022-10-27   10:39:00    C         60
#> 4 2022-10-27   10:40:30    D         60
#> 5 2022-10-27   10:42:00    E         60
#> 6 2022-10-27   10:43:30    F         60
#> 7 2022-10-27   11:00:00    G         60

Important note: in this sample metadata, our measurement identified is labeled Plot, but this could be named, and refer to, anything: bottle, sample, collar, etc. It’s simply an identifier for this measurement, i.e. this row.

The wtf_metadata_match function matches up the data with metadata, using the TIMESTAMP column that wtf_read_LI7810 helpfully created when it read the data file.

dat$metadat_row <- wtf_metadata_match(
  data_timestamps = dat$TIMESTAMP,
  start_dates = metadat$Date,
  start_times = metadat$Start_time,
  obs_lengths = metadat$Obs_length + 10) # 10 is expected dead band length
#> 1 entry had no timestamp matches!

# Note that wtf_metadata_match() warns us that one metadata row didn't match any data

# Based on the row match information, add a "Plot" column to the data
dat$Plot <- metadat$Plot[dat$metadat_row]
metadat$metadat_row <- seq_len(nrow(metadat))

# ...and plot
p <- ggplot(dat, aes(TIMESTAMP, CO2, color = Plot)) + geom_point()
print(p)

Some of these are clearly not correct–the measurement time seems to be shorter then 60 seconds for the C, D, and E plots:

In real life we’d want to correct the faulty metadata at its source. Here, we’ll just change the values programmatically and re-match:

metadat$Obs_length[3:5] <- c(30, 45, 45)
dat$metadat_row <- wtf_metadata_match(
  data_timestamps = dat$TIMESTAMP,
  start_dates = metadat$Date,
  start_times = metadat$Start_time,
  obs_lengths = metadat$Obs_length + 10)
#> 1 entry had no timestamp matches!
dat$Plot <- metadat$Plot[dat$metadat_row]

p %+% dat

That looks better!

Unit conversion

We’d like our final units to be in µmol/m2/s, and so need to do some unit conversion. (This can happen either before or after flux computation, below.) The package provides wtf_ppm_to_umol() and wtf_ppb_to_nmol() functions that perform this conversion using the Ideal Gas Law.

dat$CO2_umol <- wtf_ppm_to_umol(dat$CO2, 
                                volume = 0.1, # m3
                                temp = 24)    # degrees C
#> Assuming atm = 101325 Pa
#> Using R = 8.31446261815324 m3 Pa K-1 mol-1

# See the message: because we didn't provide the 'atm' parameter, 
# wtf_ppm_to_umol assumed standard pressure.

# Also normalize by ground area (0.16 m2 in this example)
dat$CO2_umol_m2 <- dat$CO2_umol / 0.16

Note that in the example above we’re using a constant system volume and measurement ground area. If that’s not the case, there should be a column in the metadata providing the changing values (e.g. giving volume in m3) for each measurement. Then after calling wtf_metadata_match(), merge the data and metadata and pass the appropriate column to wtf_ppm_to_umol(). Here’s an example:

# Let's say volume varies by measurement; this can happen if the chamber
# height changes depending on the ground vegetation in each plot
metadat$Volume <- c(0.1, 0.2, 0.1, 0.1, 0.3, 0.1, 0.1)

# Merge the data and metadata
dat_changing_vol <- merge(dat, metadat[c("Plot", "Volume")], by = "Plot", all.x = TRUE)

# Unit conversion as above, but using the changing volume information:
dat_changing_vol$CO2_umol <- wtf_ppm_to_umol(dat_changing_vol$CO2, 
                                             volume = dat_changing_vol$Volume,
                                             temp = 24)
#> Assuming atm = 101325 Pa
#> Using R = 8.31446261815324 m3 Pa K-1 mol-1
# We still have constant ground area in this example
dat_changing_vol$CO2_umol_m2 <- dat_changing_vol$CO2_umol / 0.16

# Relative to the previous constant-volume example, our area-normalized
# amounts (µmol) have now increased for plots B and E because
# of their larger volumes:
aggregate(CO2_umol_m2 ~ Plot, data = dat, FUN = mean)
#>   Plot CO2_umol_m2
#> 1    A    11855.81
#> 2    B    11908.15
#> 3    C    11787.88
#> 4    D    11776.09
#> 5    E    11974.30
#> 6    F    11976.25
aggregate(CO2_umol_m2 ~ Plot, data = dat_changing_vol, FUN = mean)
#>   Plot CO2_umol_m2
#> 1    A    11855.81
#> 2    B    23816.30
#> 3    C    11787.88
#> 4    D    11776.09
#> 5    E    35922.89
#> 6    F    11976.25

Compute fluxes

The wtf_compute_fluxes function provides a general-purpose tool for computing fluxes from concentration time series, as well as associated QA/QC information.

fluxes <- wtf_compute_fluxes(dat,
                             group_column = "Plot", 
                             time_column = "TIMESTAMP", 
                             gas_column = "CO2_umol_m2",
                             dead_band = 10)
#> NOTE: flux_HM1981 is non-NA, implying nonlinear data
#> NOTE: flux_HM1981 is non-NA, implying nonlinear data
#> NOTE: flux_HM1981 is non-NA, implying nonlinear data

# By default, wtf_compute_fluxes returns a data.frame with one row per
# grouping variable value (i.e., per measurement). The first column is the
# group label; the second is the average value of the `time_column`;
# and the rest of the columns are fit statistics for a linear fit of
# concentration as a function of time, along with information about polynomial
# and robust-linear fits. See ?wtf_compute_fluxes for more details.

# For clarity, print out only a subset of the columns 
fluxes[c("Plot", "TIMESTAMP", "adj.r.squared", "flux_estimate", "flux_HM1981")]
#>   Plot           TIMESTAMP adj.r.squared flux_estimate flux_HM1981
#> 1    A 2022-10-27 10:36:15     0.9474838      4.862042    4.794940
#> 2    B 2022-10-27 10:37:54     0.9457885      4.024164          NA
#> 3    C 2022-10-27 10:39:24     0.5860188      3.967315    4.483758
#> 4    D 2022-10-27 10:41:02     0.9549869      6.463719          NA
#> 5    E 2022-10-27 10:42:32     0.9821106      6.680752          NA
#> 6    F 2022-10-27 10:43:54     0.9494660      7.231148    6.261085

Note that the fluxes extract printed above has one row per Plot, the grouping variable; the mean TIMESTAMP of the group; model statistics such as adj.r.squared; and the flux (i.e., slope) estimate. The final column, flux_HM1981, gives the flux computed using a nonlinear model derived from diffusion theory, following Hutchinson and Mosier (1981). This is only numeric (i.e., not NA) when the data show evidence of a saturating curvature. So in this case we might want to examine more carefully the data from plots A, C, and F.

Plotting our computed fluxes:

ggplot(fluxes, aes(Plot, flux_estimate, color = adj.r.squared)) +
  geom_point() +
  geom_linerange(aes(ymin = flux_estimate - flux_std.error,
                     ymax = flux_estimate + flux_std.error)) +
  ylab("CO2 flux (µmol/m2/s)")

We might want to check whether the robust-linear slope (flux) diverges from the linear fit slope, suggesting influential outliers, or whether the polynomial R2 is much larger, potentially indicating curvature of the observations due to e.g. diffusion limitations.

ggplot(fluxes, aes(flux_estimate, flux_estimate_robust, color = Plot)) +
  geom_point() + geom_abline() + theme(legend.position = "none")
ggplot(fluxes, aes(adj.r.squared, r.squared_poly, color = Plot)) +
  geom_point() + geom_abline() + theme(legend.position="none")

The plot C (green) data have more scatter, and thus lower R2 values and higher uncertainty on the computed flux, but there’s no strong evidence of nonlinearity or outlier problems (although see note above about the flux_HM1981 field).

Conclusion

This vignette covered whattheflux basics: loading data and metadata, matching the two, performing unit conversion, computing fluxes, and some basic QA/QC. The test data we worked above could be fit well by linear model, but for many reasons this might not always be true; see the vignette on integrating with the gasfluxes package for guidance on using more sophisticated model-fitting routines.