In this vignette we’ll show how requirements related to the data contained in the cohort table can be applied. For this we’ll use the Eunomia synthetic data.
library(CodelistGenerator)
library(CohortConstructor)
library(CohortCharacteristics)
library(ggplot2)
library(dplyr)
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomia_dir())
cdm <- CDMConnector::cdm_from_con(con, cdm_schema = "main",
write_schema = c(prefix = "my_study_", schema = "main"))
Let’s start by creating a cohort of acetaminophen users. Individuals will have a cohort entry for each drug exposure record they have for acetaminophen with cohort exit based on their drug record end date. Note when creating the cohort, any overlapping records will be concatenated.
acetaminophen_codes <- getDrugIngredientCodes(cdm,
name = "acetaminophen",
nameStyle = "{concept_name}")
cdm$acetaminophen <- conceptCohort(cdm = cdm,
conceptSet = acetaminophen_codes,
exit = "event_end_date",
name = "acetaminophen")
At this point we have just created our base cohort without having applied any restrictions.
summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)
We can see that in our starting cohort individuals have multiple
entries for each use of acetaminophen. However, we could keep only their
earliest cohort entry by using requireIsFirstEntry()
from
CohortConstructor.
cdm$acetaminophen <- cdm$acetaminophen |>
requireIsFirstEntry()
summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)
While the number of individuals remains unchanged, records after an individual’s first have been excluded.
If we wanted to keep the latest record per person instead of the
earliest we would use requireIsLastEntry()
instead. Or if
we want to keep some range of records per person we can use the
requireIsEntry()
function.
Individuals may contribute multiple records over extended periods. We
can filter out records that fall outside a specified date range using
the requireInDateRange
function.
Multiple restrictions can be applied to a cohort, however it is important to note that the order that requirements are applied will often matter.
cdm$acetaminophen_1 <- conceptCohort(cdm = cdm,
conceptSet = acetaminophen_codes,
name = "acetaminophen_1") |>
requireIsFirstEntry() |>
requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01")))
cdm$acetaminophen_2 <- conceptCohort(cdm = cdm,
conceptSet = acetaminophen_codes,
name = "acetaminophen_2") |>
requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01"))) |>
requireIsFirstEntry()
summary_attrition_1 <- summariseCohortAttrition(cdm$acetaminophen_1)
summary_attrition_2 <- summariseCohortAttrition(cdm$acetaminophen_2)
Here we see attrition if we apply our entry requirement before our date requirement. In this case we have a cohort of people with their first ever record of acetaminophen which occurs in our study period.
And here we see attrition if we apply our date requirement before our entry requirement. In this case we have a cohort of people with their first record of acetaminophen in the study period, although this will not necessarily be their first record ever.
Another useful functionality, particularly when working with multiple
cohorts or performing a network study, is provided by
requireMinCohortCount
. Here we will only keep cohorts with
a minimum count, filtering out records from cohorts with fewer than this
number.
As an example let’s create a cohort for every drug ingredient we see in Eunomia. We can first get the drug ingredient codes.
medication_codes <- getDrugIngredientCodes(cdm = cdm, nameStyle = "{concept_name}")
medication_codes
#>
#> - acetaminophen (7 codes)
#> - albuterol (2 codes)
#> - alendronate (2 codes)
#> - alfentanil (1 codes)
#> - alteplase (2 codes)
#> - amiodarone (2 codes)
#> along with 85 more codelists
We can see that when we make all these cohorts many have only a small number of individuals.
cdm$medications <- conceptCohort(cdm = cdm,
conceptSet = medication_codes,
name = "medications")
cohortCount(cdm$medications) |>
filter(number_subjects > 0) |>
ggplot() +
geom_histogram(aes(number_subjects),
colour = "black",
binwidth = 25) +
xlab("Number of subjects") +
theme_bw()
If we apply a minimum cohort count of 500, we end up with far fewer cohorts that all have a sufficient number of study participants.