```
library(SimplyAgree)
library(cccrm)
library(tidyverse)
library(ggpubr)
data("temps")
= temps df_temps
```

In the study by Ravanelli and Jay
(2020), they attempted to estimate the effect of varying the time
of day (AM or PM) on the measurement of thermoregulatory variables
(e.g., rectal and esophageal temperature). In total, participants
completed 6 separate trials wherein these variables were measured. While
this is a robust study of these variables the analyses focused on ANOVAs
and t-tests to determine whether or not the time-of-day (AM or PM). This
poses a problem because 1) they were trying to test for equivalence and
2) this is a study of *agreement* not *differences* (See
Lin (1989)). Due to the latter point, the
use of t-test or ANOVAs (F-tests) is rather inappropriate since they
provide an answer to different, albeit related, question.

Instead, the authors could test their hypotheses by using tools that
estimate the absolute *agreement* between the AM and PM sessions
within each condition. This is rather complicated because we have
multiple measurement within each participant. However, between the tools
included in `SimplyAgree`

and `cccrm`

(Josep Lluis Carrasco and Martinez 2020) I
believe we can get closer to the right answer.

In order to understand the underlying processes of these functions
and procedures I highly recommend reading the statistical literature
that documents methods within these functions. For the
`cccrm`

package please see the work by Josep L. Carrasco and Jover (2003), Josep L. Carrasco, King, and Chinchilli (2009),
and Josep L. Carrasco et al. (2013). The
`loa_mixed`

function was inspired by the work of Parker et al. (2016) which documented how to
implement multi-level models and bootstrapping to estimate the limits of
agreement.

An easy approach to measuring agreement between 2 conditions or measurement tools is through the concordance correlation coefficient (CCC). The CCC essentially provides a single coefficient (values between 0 and 1) that provides an estimate to how closely one measurement is to another. In its simplest form it is a type of intraclass correlation coefficient that takes into account the mean difference between two measurements. In other words, if we were to draw a line of identity on a graph and plot two measurements (X & Y), the closer those points are to the line of identity the higher the CCC (and vice versa).

`qplot(1,1) + geom_abline(intercept = 0, slope = 1)`

In the following sections, let us see how well esophageal and rectal temperature are in agreement after exercising in the heat for 1 hour at differing conditions.

Now, based on the typical thresholds (0.8 can be considered a “good” CCC), neither Trec as a raw value or a change score (Trec delta) is within acceptable degrees of agreement. As I will address later, this may not be an accurate and there are sometimes where there is a low CCC but the expected differences between conditions is acceptable (limits of agreement).

```
= cccrm::cccUst(dataset = df_temps,
ccc_rec.post ry = "trec_post",
rtime = "trial_condition",
rmet = "tod")
ccc_rec.post#> CCC estimated by U-statistics:
#> CCC LL CI 95% UL CI 95% SE CCC
#> 0.218403578 0.007835121 0.410425602 0.104047391
```

```
= cccrm::cccUst(dataset = df_temps,
ccc_rec.delta ry = "trec_delta",
rtime = "trial_condition",
rmet = "tod")
ccc_rec.delta#> CCC estimated by U-statistics:
#> CCC LL CI 95% UL CI 95% SE CCC
#> 0.66232800 0.49409601 0.78275101 0.07316927
```

Finally, we can visualize the concordance between the two different types of measurements and the respective time-of-day and conditions. From the plot we can see there is clear bias in the raw post exercise values (higher in the PM), but even when “correcting for baseline differences” by calculating the differences scores we can see a higher degree of disagreement between the two conditions.

We can replicate the same analyses for esophageal temperature. From the data and plots below we can see that the post exercise CCC is much improved compared to rectal temperature. However, there is no further improvement when looking at the delta (difference scores) for pre-to-post exercise.

```
= cccrm::cccUst(dataset = df_temps,
ccc_eso.post ry = "teso_post",
rtime = "trial_condition",
rmet = "tod")
ccc_eso.post#> CCC estimated by U-statistics:
#> CCC LL CI 95% UL CI 95% SE CCC
#> 0.67333327 0.48765924 0.80073160 0.07915895
```

```
= cccrm::cccUst(dataset = df_temps,
ccc_eso.delta ry = "teso_delta",
rtime = "trial_condition",
rmet = "tod")
ccc_eso.delta#> CCC estimated by U-statistics:
#> CCC LL CI 95% UL CI 95% SE CCC
#> 0.5654583 0.2663607 0.7652237 0.1276819
```

`#> Warning: Removed 3 row(s) containing missing values (geom_path).`

In addition to the CCC we can use the `loa_mixed`

function
in order to calculate the “limits of agreement”. Typically the 95%
Limits of Agreement are calculated which provide the difference between
two measuring systems for 95% of future measurements pairs. In order to
do that we will need the data in a “wide” format where each measurement
(in this case AM and PM) are their own column and then we can calculate
a column that is the difference score. Once we have the data in this
“wide” format, we can then use the `loa_mixed`

function to
calculate the average difference (mean bias) and the variance (which
determines the limits of agreement).

So we will calculate the limits of agreement using the
`loa_mixed`

function. We will need to identify the columns
with the right information using the `diff`

,
`condition`

, and `id`

arguments. We then select
the right data set using the `data`

argument. Lastly, we
specify the specifics of the conditions for how the limits are
calculated. For this specific analysis I decided to calculate 95% limits
of agreement with 95% confidence intervals, and I will use
bias-corrected accelerated (bca) bootstrap confidence intervals.

```
= SimplyAgree::loa_mixed(diff = "diff",
rec.post_loa condition = "trial_condition",
id = "id",
data = df_rec.post,
conf.level = .95,
agree.level = .95,
replicates = 199,
type = "bca")
```

When we create a table of the results we can see that CCC and limits of agreement (LoA), at least for Trec post exercise, are providing the same conclusion (poor agreement).

```
::kable(rec.post_loa$loa,
knitrcaption = "LoA: Trec Post Exercise")
```

estimate | lower.ci | upper.ci | |
---|---|---|---|

Mean Bias | 0.2300000 | 0.1727896 | 0.3391757 |

Lower LoA | -0.1958766 | -0.2948109 | -0.1089945 |

Upper LoA | 0.6558766 | 0.5243691 | 0.8450319 |

Furthermore, we can visualize the results with a typical Bland-Altman plot of the LoA.

`plot(rec.post_loa)`

Now, when we look at the Delta values for Trec we find that there is much closer agreement (maybe even acceptable agreement) when we look at LoA. However, we cannot say that the average difference would be less than 0.25 which may not be acceptable for some researchers.

```
::kable(rec.delta_loa$loa,
knitrcaption = "LoA: Delta Trec")
```

estimate | lower.ci | upper.ci | |
---|---|---|---|

Mean Bias | -0.0210000 | -0.0571838 | 0.0243662 |

Lower LoA | -0.2552956 | -0.3613519 | -0.1805686 |

Upper LoA | 0.2132956 | 0.1670232 | 0.2932911 |

`plot(rec.delta_loa)`

We can repeat the process for esophageal temperature. Overall, the results are fairly similar, and while there is better agreement on the delta (change scores), it is still fairly difficult to determine that there is “good” agreement between the AM and PM measurements.

```
= SimplyAgree::loa_mixed(diff = "diff",
eso.post_loa condition = "trial_condition",
id = "id",
data = df_eso.post,
conf.level = .95,
agree.level = .95,
replicates = 199,
type = "bca")
```

```
::kable(eso.post_loa$loa,
knitrcaption = "LoA: Teso Post Exercise")
```

estimate | lower.ci | upper.ci | |
---|---|---|---|

Mean Bias | 0.1793333 | 0.1341438 | 0.2243988 |

Lower LoA | -0.0795336 | -0.1441305 | 0.0308588 |

Upper LoA | 0.4382003 | 0.3465289 | 0.5292906 |

`plot(eso.post_loa)`

```
::kable(eso.delta_loa$loa,
knitrcaption = "LoA: Delta Teso")
```

estimate | lower.ci | upper.ci | |
---|---|---|---|

Mean Bias | 0.0026667 | -0.0324062 | 0.0370979 |

Lower LoA | -0.2165808 | -0.3011728 | -0.1585443 |

Upper LoA | 0.2219141 | 0.1515147 | 0.3399762 |

`plot(eso.delta_loa)`

Carrasco, Josep L., and Lluı́s Jover. 2003. “Estimating the
Generalized Concordance Correlation Coefficient Through Variance
Components.” *Biometrics* 59 (4): 849–58. https://doi.org/10.1111/j.0006-341x.2003.00099.x.

Carrasco, Josep L., Tonya S. King, and Vernon M. Chinchilli. 2009.
“The Concordance Correlation Coefficient for Repeated Measures
Estimated by Variance Components.” *Journal of
Biopharmaceutical Statistics* 19 (1): 90–105. https://doi.org/10.1080/10543400802527890.

Carrasco, Josep Lluis, and Josep Puig Martinez. 2020. *Cccrm:
Concordance Correlation Coefficient for Repeated (and Non-Repeated)
Measures*. https://CRAN.R-project.org/package=cccrm.

Carrasco, Josep L., Brenda R. Phillips, Josep Puig-Martinez, Tonya S.
King, and Vernon M. Chinchilli. 2013. “Estimation of the
Concordance Correlation Coefficient for Repeated Measures Using
SAS and r.” *Computer Methods and Programs in
Biomedicine* 109 (3): 293–304. https://doi.org/10.1016/j.cmpb.2012.09.002.

Lin, Lawrence I-Kuei. 1989. “A Concordance Correlation Coefficient
to Evaluate Reproducibility.” *Biometrics* 45 (1): 255. https://doi.org/10.2307/2532051.

Parker, Richard A., Christopher J. Weir, Noah Rubio, Roberto Rabinovich,
Hilary Pinnock, Janet Hanley, Lucy McCloughan, et al. 2016.
“Application of Mixed Effects Limits of Agreement in the Presence
of Multiple Sources of Variability: Exemplar from the Comparison of
Several Devices to Measure Respiratory Rate in COPD
Patients.” Edited by Hong-Long (James) Ji. *PLOS
ONE* 11 (12): e0168321. https://doi.org/10.1371/journal.pone.0168321.

Ravanelli, Nicholas, and Ollie Jay. 2020. “The Change in Core
Temperature and Sweating Response During Exercise Are Unaffected by Time
of Day Within the Wake Period.” *Medicine and Science in
Sports and Exercise*. https://doi.org/10.1249/mss.0000000000002575.