Example: Gray whale body length and body condition
We will use a small example dataset consisting of body length and
body width measurements of Pacific Coast Feeding Group (PCFG) gray
whales imaged along the coast of Newport, Oregon, USA using two
different drones, a DJI Inspire 2 (I2, n = 5 individuals) and a DJI
Phantom 4 Pro (P4P, n = 5 individuals). The P4P contains only a
barometer for estimating altitude, while the I2 contains both a
barometer and a LiDAR (or laser) altimeter (LidarBoX, Bierlich et al., 2024).
In this example, we will use data from the I2 (see Xcertianty_informative_priors
for an example using P4P data).
We will use the length and width measurements to calculate several
body condition metrics (body area index, body volume, surface area, and
standardized widths). We used open-source software MorphoMetriX v2
and CollatriX to
measure the whales and process the measurements, respectively.
Steps:
1. Prepare calibration data and observation (whale) data.
2. Build sampler.
3. Run the sampler.
4. Calculate body condition metrics.
5. View outputs.
We’ll first load the Xcertainty package, as well as other packages we
will use throughout this example.
library(Xcertainty)
library(tidyverse)
library(ggdist)
Calibration Objects
First we’ll load and prepare the calibration data, which is from Bierlich et al., 2024.
Note that “CO” here stands for “Calibration Object” used for training
data, and “CO.L” is the true length of the CO (1 m) and “Lpix” is the
photogrammetric measurement of the CO in pixels. Each UAS has a unique
CO.ID so that the training data and observation (whale) data can be
linked. We will filter to use CO data from the I2 drone.
# load calibration measurement data
data("co_data")
# sample size for both drones
table(co_data$uas)
##
## I2 P4P
## 49 69
# filter for I2 drone
co_data_I2 <- co_data %>% filter(uas == "I2")
Next, well format the data using
parse_observations()
.
calibration_data = parse_observations(
x = co_data_I2,
subject_col = 'CO.ID',
meas_col = 'Lpix',
tlen_col = 'CO.L',
image_col = 'image',
barometer_col = 'Baro_Alt',
laser_col = 'Laser_Alt',
flen_col = 'Focal_Length',
iwidth_col = 'Iw',
swidth_col = 'Sw',
uas_col = 'uas'
)
This creates a list of four elements:
* calibration_data$pixel_counts
.
* calibration_data$training_objects
.
* calibration_data$image_info
. *
calibration_data$prediction_objects
Gray whale measurements
Now we’ll load and prepare the gray whale measurement data. The
column ‘whale_ID’ denotes the unique individual. Note, some individuals
have multiple images – Xcertainty incorporates measurements across
images for an individual to produce a single posterior distribution for
the measurement of that individual. For example, multiple body length
measurements from different images of an individual will produce a
single posterior distribution of body length for that individual.
To estimate body condition for these gray whales, we will use body
widths between 20-70% of the body length. We’ll save the column names of
these widths as their own object.
For this example, we will only use whale measurements collected using
the I2 drone. See the vignette titled “Xcertainty_informative_prios” for
an example using P4P data. Note, that Xcertainty can also incorporate
measurements with missing LiDAR data (NAs).
# load gray whale measurement data
data("gw_data")
# quick look at the data
head(gw_data)
## # A tibble: 6 × 34
## whale_ID image year DOY uas Focal…¹ Focal…² Sw Iw Baro_…³ Launc…⁴ Baro_…⁵ Laser…⁶ CO.ID TL_px TL_w0…⁷ TL_w1…⁸ TL_w1…⁹
## <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GW_01 image… 2022 233 I2 25 NA 17.3 3840 41.8 1.72 43.5 44.3 CO_I… 1536. 83.5 117. 154.
## 2 GW_02 image… 2019 249 P4P 8.8 9.2 13.2 3840 25.6 1.72 27.3 NA CO_P… 1185. 68.9 97.0 107.
## 3 GW_02 image… 2019 249 P4P 8.8 9.2 13.2 3840 25.6 1.72 27.3 NA CO_P… 1203. 68.9 91.9 107.
## 4 GW_03 image… 2022 191 I2 25 NA 17.3 3840 46.7 1.72 48.4 47.5 CO_I… 951. 57.4 87.5 107.
## 5 GW_03 image… 2022 191 I2 25 NA 17.3 3840 47 1.72 48.7 46.5 CO_I… 959. 59.8 86.9 109.
## 6 GW_04 image… 2022 209 I2 25 NA 17.3 3840 29.5 2.18 31.7 29.0 CO_I… 1925. 102. 138. 181.
## # … with 16 more variables: TL_w20.00_px <dbl>, TL_w25.00_px <dbl>, TL_w30.00_px <dbl>, TL_w35.00_px <dbl>, TL_w40.00_px <dbl>,
## # TL_w45.00_px <dbl>, TL_w50.00_px <dbl>, TL_w55.00_px <dbl>, TL_w60.00_px <dbl>, TL_w65.00_px <dbl>, TL_w70.00_px <dbl>,
## # TL_w75.00_px <dbl>, TL_w80.00_px <dbl>, TL_w85.00_px <dbl>, TL_w90.00_px <dbl>, TL_w95.00_px <dbl>, and abbreviated variable
## # names ¹Focal_Length, ²Focal_Length_adj, ³Baro_raw, ⁴Launch_Ht, ⁵Baro_Alt, ⁶Laser_Alt, ⁷TL_w05.00_px, ⁸TL_w10.00_px,
## # ⁹TL_w15.00_px
# number of images per whale ID
table(gw_data$whale_ID)
##
## GW_01 GW_02 GW_03 GW_04 GW_05 GW_06 GW_07 GW_08 GW_09 GW_10
## 1 2 2 1 2 1 1 1 1 3
# filter for I2 drone and select specific widths to include for estimating body condition (20-70%)
gw_measurements <- gw_data %>% filter(uas == "I2") %>%
select(!c("TL_w05.00_px", "TL_w10.00_px", "TL_w15.00_px",
"TL_w75.00_px", "TL_w80.00_px", "TL_w85.00_px", "TL_w90.00_px", "TL_w95.00_px"))
# identify the width columns in the dataset
width_names = grep(
pattern = 'TL_w\\_*',
x = colnames(gw_measurements),
value = TRUE
)
Next, we’ll use parse_observations()
to prepare the
whale data. Since Xcertainty
incorporates errors associated
with both a LiDAR altimeter and a barometer into the output measurement,
the input measurements must be in pixels. In our example dataset of gray
whales, measurements are already in pixels. If measurements in a
dataframe are in meters, they can easily be converted into pixels using
alt_conversion_col
to assign which altitude column should
be used to “back calculate” measurements in meters into pixels. For
example, use alt_conversion_col = 'Baro_Alt
if the
measurements used the barometer to convert measurements into meters.
Note that you can also set the specific timepoint to link
measurements of individuals using timepoint_col
. For
example, if you wanted all the total body length measurements of an
individual included to produce a single length measurement over the
course of the season, you may choose
timepoint_col = 'year'
, or you may want body condition at
the daily level, so you could enter timepoint_col = 'date'
.
In our example, measurements are already synced at the daily level, so
we will keep default as is.
Also, note that we assign the measurement column
(meas_col
) for TL and the widths between 20-70% that we
saved as “width_names”.
# parse field study
whale_data = parse_observations(
x = gw_measurements,
subject_col = 'whale_ID',
meas_col = c('TL_px', width_names),
image_col = 'image',
barometer_col = 'Baro_Alt',
laser_col = 'Laser_Alt',
flen_col = 'Focal_Length',
iwidth_col = 'Iw',
swidth_col = 'Sw',
uas_col = 'uas'
#alt_conversion_col = 'altitude'
)
This creates a list of four elements:
* calibration_data$pixel_counts
.
* calibration_data$training_objects
.
* calibration_data$image_info
. *
calibration_data$prediction_objects
Build sampler
Now we will build a sampler. Always start with using non-informative
priors, which should be appropriate for most datasets. In some cases,
using informative priors may be more appropriate, especially for
datasets that have high errors and when the model is overparameterized –
see the vignette “Xcertianty_informative_priors” for an example. We’ll
set the altitudes (image_altitude
) and object length
measurements (object_lengths
) to cover an overly wide range
for our target species.
sampler = independent_length_sampler(
data = combine_observations(calibration_data, whale_data),
priors = list(
image_altitude = c(min = 0.1, max = 130),
altimeter_bias = rbind(
data.frame(altimeter = 'Barometer', mean = 0, sd = 1e2),
data.frame(altimeter = 'Laser', mean = 0, sd = 1e2)
),
altimeter_variance = rbind(
data.frame(altimeter = 'Barometer', shape = .01, rate = .01),
data.frame(altimeter = 'Laser', shape = .01, rate = .01)
),
altimeter_scaling = rbind(
data.frame(altimeter = 'Barometer', mean = 1, sd = 1e1),
data.frame(altimeter = 'Laser', mean = 1, sd = 1e1)
),
pixel_variance = c(shape = .01, rate = .01),
object_lengths = c(min = .01, max = 20)
)
)
## Joining with `by = join_by(altimeter)`
## Joining with `by = join_by(altimeter)`
## Joining with `by = join_by(altimeter)`
## Joining with `by = join_by(UAS, altimeter)`
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
## Compiling [Note] This may take a minute. [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## ===== Monitors =====
## thin = 1: altimeter_bias, altimeter_scaling, altimeter_variance, image_altitude, object_length, pixel_variance
## ===== Samplers =====
## RW sampler (117)
## - image_altitude[] (57 elements)
## - object_length[] (60 elements)
## conjugate sampler (7)
## - altimeter_bias[] (2 elements)
## - altimeter_scaling[] (2 elements)
## - altimeter_variance[] (2 elements)
## - pixel_variance
## Compiling
## [Note] This may take a minute.
## [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
View Sampler Outputs (TL and widths)
Our saved output
contains all the posterior samples and
summaries of all training data and length and width measurements from
the sampler. Note, that there are many objects stored
in output
, so it is best to view specific selections rather
than viewing all of the objects stored in output
at once,
as this can take a very long time to load and cause R to freeze.
We can view the posterior summaries (mean, sd, etc.) for each
altimeter. Note that the lower
and upper
represent the 95% highest posterior density intervals (HPDI) of the
posterior distribution.
output$summaries$altimeters
## UAS altimeter parameter mean sd lower upper ESS PSS
## 1 I2 Barometer bias -0.7203422 1.57906441 -3.7660942 2.423896 1906.974 50001
## 2 I2 Barometer variance 5.4004467 1.13193409 3.3639491 7.613621 12398.181 50001
## 3 I2 Barometer scaling 1.0032498 0.04093036 0.9236957 1.083940 1463.894 50001
## 4 I2 Laser bias -0.5243021 1.62965603 -3.5940199 2.778073 2475.701 50001
## 5 I2 Laser variance 6.0155576 1.30824124 3.7594362 8.681424 3291.101 50001
## 6 I2 Laser scaling 0.9983718 0.04207249 0.9144538 1.078665 1788.535 50001
And we can view and compare the posterior outputs for each image’s
altitude compared to the observed altitude from the barometer (blue) and
LiDAR (orange) in the training dataset.
output$summaries$images %>% left_join(co_data %>% rename(Image = image), by = "Image") %>%
ggplot() + theme_bw() +
geom_pointrange(aes(x = Baro_Alt, y = mean, ymin = lower, ymax = upper), color = "blue") +
geom_pointrange(aes(x = Laser_Alt, y = mean, ymin = lower, ymax = upper), color = "orange") +
geom_abline(slope = 1, intercept = 0, lty = 2) +
ylab("posterior altitude (m)") + xlab("observed altitude (m)")
## Warning: Removed 8 rows containing missing values (`geom_pointrange()`).
## Removed 8 rows containing missing values (`geom_pointrange()`).
plot of chunk Xc_Fig1_posterior_vs_obs_alt
We can also view the pixel variance from the training data
output$pixel_error$summary
## error parameter mean sd lower upper ESS PSS
## 1 pixel variance 18.74745 3.68237 12.24157 26.13657 19027.86 50001
We can view the posterior summaries (mean, sd, etc.) for all
measurements of each individual whale. As above, the lower
and upper
represent the 95% HPDI of the posterior
distribution for that specific measurement.
head(output$summaries$objects)
## Subject Measurement Timepoint parameter mean sd lower upper ESS PSS
## 1 GW_01 TL_px 1 length 12.524335 0.48432091 11.605177 13.486068 247.0445 50001
## 2 GW_01 TL_w20.00_px 1 length 1.557421 0.06986998 1.421111 1.693672 354.2822 50001
## 3 GW_01 TL_w25.00_px 1 length 1.915799 0.08184221 1.755663 2.074863 312.8519 50001
## 4 GW_01 TL_w30.00_px 1 length 2.059340 0.08710605 1.889557 2.229407 300.9867 50001
## 5 GW_01 TL_w35.00_px 1 length 2.124176 0.08910634 1.953447 2.298639 306.4567 50001
## 6 GW_01 TL_w40.00_px 1 length 2.123675 0.08898825 1.950219 2.296821 302.6361 50001
You can filter to view a specific measurement across all individuals,
such as total body length (TL).
output$summaries$objects %>% filter(Measurement == "TL_px")
## Subject Measurement Timepoint parameter mean sd lower upper ESS PSS
## 1 GW_01 TL_px 1 length 12.524335 0.4843209 11.605177 13.486068 247.04452 50001
## 2 GW_03 TL_px 1 length 8.388758 0.2217488 7.951812 8.820734 661.61972 50001
## 3 GW_04 TL_px 1 length 11.076464 0.5931233 9.846138 12.232099 90.86436 50001
## 4 GW_06 TL_px 1 length 10.083364 0.5566169 8.972447 11.189054 79.22054 50001
## 5 GW_10 TL_px 1 length 9.642295 0.2222675 9.224671 10.086814 580.03644 50001
Or filter directly for all measurements from a specific
individual
output$summaries$objects %>% filter(Subject == "GW_01")
## Subject Measurement Timepoint parameter mean sd lower upper ESS PSS
## 1 GW_01 TL_px 1 length 12.5243348 0.48432091 11.6051773 13.4860681 247.0445 50001
## 2 GW_01 TL_w20.00_px 1 length 1.5574212 0.06986998 1.4211114 1.6936719 354.2822 50001
## 3 GW_01 TL_w25.00_px 1 length 1.9157989 0.08184221 1.7556626 2.0748630 312.8519 50001
## 4 GW_01 TL_w30.00_px 1 length 2.0593397 0.08710605 1.8895567 2.2294066 300.9867 50001
## 5 GW_01 TL_w35.00_px 1 length 2.1241764 0.08910634 1.9534471 2.2986388 306.4567 50001
## 6 GW_01 TL_w40.00_px 1 length 2.1236754 0.08898825 1.9502195 2.2968207 302.6361 50001
## 7 GW_01 TL_w45.00_px 1 length 2.0252817 0.08576307 1.8535119 2.1883678 324.2996 50001
## 8 GW_01 TL_w50.00_px 1 length 1.8840753 0.08067743 1.7292383 2.0439870 316.5492 50001
## 9 GW_01 TL_w55.00_px 1 length 1.6956336 0.07438487 1.5523218 1.8433656 338.3559 50001
## 10 GW_01 TL_w60.00_px 1 length 1.3357929 0.06237047 1.2158972 1.4593491 393.8830 50001
## 11 GW_01 TL_w65.00_px 1 length 1.1042666 0.05542504 0.9977178 1.2149580 456.8998 50001
## 12 GW_01 TL_w70.00_px 1 length 0.9011541 0.04933852 0.8044088 0.9975533 596.1318 50001
Plot total body length with associated uncertainty for each
individual. Here the black dots represent the mean of the posterior
distribution for total body length and the black bars around each dot
represents the uncertainty, as 95% HPDI.
output$summaries$objects %>% filter(Measurement == "TL_px") %>%
ggplot() + theme_bw() +
geom_pointrange(aes(x = Subject, y = mean, ymin =lower, ymax = upper)) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) +
ylab("Total body length (m)")
plot of chunk Xc_Fig2_TL_per_subject
You can also view and plot the posterior samples for an individual’s
measurement. Note, be sure to exclude the first half of the posterior
samples, i.e., if 10000 samples, exclude the first 5000. To demonstrate
this below, we’ll first save the samples for an individual as an object,
then plot the distribution with the first half of the samples
excluded.
ID_samples <- output$objects$`GW_01 TL_px 1`$samples
data.frame(TL = ID_samples[(length(ID_samples)/2):length((ID_samples))]) %>%
ggplot() + stat_halfeye(aes(TL), .width = 0.95) + theme_bw()
plot of chunk Xc_Fig3_TL_dist
Body Condition
We’ll also calculate body condition from the posterior samples using
body_condition()
, which calculates several body condition
metrics using the posterior distributions of the body widths:
Note, for calculating body volume, the default for
height_ratios
is 1, implying that the vertical cross
section of the animal is circular rather than elliptical. To calculate
body volume using an elliptical model i.e., Christiansen et
al., 2019, enter the the height-width ratio for each width using
height_ratios
.
# First, enumerate the width locations along the animal's length
width_increments = as.numeric(
str_extract(
string = width_names,
pattern = '[0-9]+'
)
)
# Compute body condition
body_condition_output = body_condition(
data = whale_data,
output = output,
length_name = 'TL_px',
width_names = width_names,
width_increments = width_increments,
summary.burn = .5,
height_ratios = rep(1, length(width_names)) # assumes circular cross section
)
Note, there are a lot of objects stored in the
body_condition_output
, so it’s best to view selected
outputs rather than all objects at once, as it may take a long time to
load and can freeze R.
You can view the body condition summaries (surface_area
,
body_area_index
, body_volume
, and
standardized_widths
) across individuals using
body_condition_output$summaries
. Summaries include mean,
standard deviation (sd) and the lower and upper 95% HPDI.
View summary of BAI
head(body_condition_output$summaries$body_area_index)
## Subject Timepoint metric mean sd lower upper
## 1 GW_01 1 body_area_index 27.94158 0.1899036 27.56974 28.31794
## 2 GW_03 1 body_area_index 29.64340 0.2194318 29.20917 30.06942
## 3 GW_04 1 body_area_index 25.38457 0.1493190 25.08816 25.67488
## 4 GW_06 1 body_area_index 29.02524 0.1561085 28.71666 29.33080
## 5 GW_10 1 body_area_index 25.69888 0.1715854 25.36228 26.03709
Plot BAI results
body_condition_output$summaries$body_area_index %>%
ggplot() + theme_bw() +
geom_pointrange(aes(x = Subject, y = mean, ymin =lower, ymax = upper)) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) +
ylab("Body Area Index")
plot of chunk Xc_Fig4_bai
View summary of Body Volume
head(body_condition_output$summaries$body_volume)
## Subject Timepoint metric mean sd lower upper
## 1 GW_01 1 body_volume 7.198659 1.1720632 5.0196814 9.562925
## 2 GW_03 1 body_volume 1.133694 0.2024182 0.7568319 1.541614
## 3 GW_04 1 body_volume 2.977343 0.8003176 1.3535517 4.498887
## 4 GW_06 1 body_volume 2.991911 0.8302595 1.4202708 4.590484
## 5 GW_10 1 body_volume 1.389413 0.2091401 0.9990588 1.804546
Plot Body Volume results
body_condition_output$summaries$body_volume %>%
ggplot() + theme_bw() +
geom_pointrange(aes(x = Subject, y = mean, ymin =lower, ymax = upper)) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) +
ylab("Body Volume (m^3)")
plot of chunk Xc_Fig5_body_vol
View the standardized widths of an individual
body_condition_output$summaries$standardized_widths %>% filter(Subject == "GW_01")
## Subject Timepoint metric mean sd lower upper
## 1 GW_01 1 standardized_widths TL_w20.00_px 0.12435204 0.002844696 0.11869407 0.12988727
## 2 GW_01 1 standardized_widths TL_w25.00_px 0.15296737 0.002843284 0.14751147 0.15869552
## 3 GW_01 1 standardized_widths TL_w30.00_px 0.16442783 0.002858026 0.15871670 0.16996211
## 4 GW_01 1 standardized_widths TL_w35.00_px 0.16960595 0.002875781 0.16397768 0.17528082
## 5 GW_01 1 standardized_widths TL_w40.00_px 0.16956575 0.002845112 0.16389580 0.17501113
## 6 GW_01 1 standardized_widths TL_w45.00_px 0.16170918 0.002870934 0.15611762 0.16742107
## 7 GW_01 1 standardized_widths TL_w50.00_px 0.15043494 0.002861558 0.14486040 0.15607492
## 8 GW_01 1 standardized_widths TL_w55.00_px 0.13538822 0.002852321 0.12975228 0.14090144
## 9 GW_01 1 standardized_widths TL_w60.00_px 0.10665678 0.002825367 0.10123983 0.11232551
## 10 GW_01 1 standardized_widths TL_w65.00_px 0.08816989 0.002824945 0.08265540 0.09373741
## 11 GW_01 1 standardized_widths TL_w70.00_px 0.07195395 0.002827889 0.06639327 0.07756519
Plot standardized widths of an individual
body_condition_output$summaries$standardized_widths$metric <- gsub("standardized_widths TL_", "", body_condition_output$summaries$standardized_widths$metric)
body_condition_output$summaries$standardized_widths %>% filter(Subject == "GW_01") %>%
ggplot() + theme_bw() +
geom_pointrange(aes(x = metric, y = mean, ymin = lower, ymax = upper)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
xlab("width%") + ylab("standardized width") + ggtitle("GW_01")
plot of chunk Xc_Fig6_std_widths
Can also view standardized widths across all individuals
body_condition_output$summaries$standardized_widths %>%
ggplot() + theme_bw() +
geom_boxplot(aes(x = metric, y = mean)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
xlab("width%") + ylab("standardized width")
plot of chunk Xc_Fig7_std_widths_all
You can also view individual’s posterior samples for any of the body
condition metrics.
head(body_condition_output$body_area_index$`GW_01`$samples)
## [1] 161.2479 157.5936 138.5102 182.5054 218.5371 169.3383
And from these posterior samples for an individual, we can plot the
distribution. Here we’ll plot BAI as an example, and include the 95%
HPDI. Remember to exclude the first half of the samples, as mentioned
above.
ID_samples <- body_condition_output$body_area_index$`GW_01 1`$samples
data.frame(BAI = ID_samples[(length(ID_samples)/2):length((ID_samples))]) %>%
ggplot() + stat_halfeye(aes(BAI), .width = 0.95) + theme_bw() + ggtitle("GW_01")
plot of chunk Xc_Fig8_bai_dist
We hope this vignette has been helpful for getting started with
organizing your input data and how to view and interpret results. Check
out our other vignettes to view examples of other samplers, including
using “informative priors” and “growth curves”.