Getting Started with nemsqar

Introduction

The nemsqar package provides an automated and reproducible framework for calculating EMS quality measures defined by the National EMS Quality Alliance (NEMSQA). These measures are widely used by EMS agencies, trauma systems, quality improvement teams, and researchers to evaluate performance and support evidence‑based improvement activities.

This vignette is written for users who are knowledgeable in EMS, injury epidemiology, or quality improvement, but who may be new to R or new to calculating NEMSQA measures using R. The focus is to guide you through each step of the workflow: loading data, preparing it for use with nemsqar, and running a selected NEMSQA measure.

By the end of this vignette, you will understand:

The sections that follow walk through the entire process, from loading EMS data in R to producing a standardized performance measure aligned with national reporting expectations.

The Measures

This vignette focuses on one NEMSQA measure implemented in nemsqar:

All measures in nemsqar follow the same basic structure. Each one requires a defined set of NEMSIS‑aligned tables and returns results in a standardized format. The following sections introduce the data required to calculate this measure and demonstrate how to run it in R.

The Data

Before calculating any measure, it is important to understand the example datasets included with nemsqar. These datasets are small, synthetic representations of NEMSIS tables. They provide a safe environment for learning the workflow before applying it to production EMS data. Because NEMSQA measure logic relies on the NEMSIS data structure, each measure requires several tables (for example, patient, response, situation, medications, vitals).

Loading Example Data

The nemsqar package includes synthetic datasets that mirror the NEMSIS tables required for NEMSQA measure calculation. You can load them with the data() function. For Asthma‑01, the following tables are required:

data("nemsqar_patient_scene_table")
data("nemsqar_response_table")
data("nemsqar_situation_table")
data("nemsqar_medications_table")

Each dataset loads into your R environment as a standard data frame.

Loading your own data

In practice, you will typically load your own EMS datasets. These may come from CSV files, databases, or data extracts. Below are common patterns for loading local files:

# Store your file path, preferably in .Renviron files
path <- file.path("some_path_to_file")

# Load data from the path using the tidyverse package `readr`
data <- readr::read_csv(path)

# Load data from the path using base R
data <- read.csv(file = path)

Inspecting Table Structure

Because many users are new to R, it is important to verify that each dataset loaded correctly and contains the variables required by the measure functions. A few simple commands help confirm this.

Review the structure of a loaded dataset

# Quick overview of column names and data types
dplyr::glimpse(nemsqar_patient_scene_table)
#> Rows: 10,000
#> Columns: 6
#> $ `Incident Patient Care Report Number - PCR (eRecord.01)` <chr> "NyXFBlJfnm-8…
#> $ `Incident Date`                                          <date> 2023-12-20, …
#> $ `Patient Age (ePatient.15)`                              <dbl> 98, 75, 24, 1…
#> $ `Patient Age Units (ePatient.16)`                        <chr> "Minutes", "D…
#> $ `Patient Date Of Birth (ePatient.17)`                    <date> 2023-12-19, …
#> $ `Patient Gender (ePatient.13)`                           <chr> "Male to Fema…

View the first few rows

# An abbreviated look at the actual data tables
head(nemsqar_patient_scene_table, n = 10)
#> # A tibble: 10 × 6
#>    Incident Patient Care Report Number …¹ `Incident Date` Patient Age (ePatien…²
#>    <chr>                                  <date>                           <dbl>
#>  1 NyXFBlJfnm-8333586176                  2023-12-20                          98
#>  2 XTLCINMLTP-8616021114                  2023-08-30                          75
#>  3 HfYjlIEQSk-9529756610                  2023-03-21                          24
#>  4 MOwVDhriyC-5915613206                  2023-09-13                         115
#>  5 ZCGOtLEPKw-7820135532                  2023-02-21                          54
#>  6 fEMvUCQCRQ-9052388486                  2023-08-23                          88
#>  7 VTLPiFWWGd-6806896482                  2023-12-09                          83
#>  8 YvZbHRTUuK-8780915452                  2023-04-06                          24
#>  9 DkKIjJSFtA-7499641828                  2023-05-07                          95
#> 10 CIQMuVGJgS-9144926148                  2023-11-13                          82
#> # ℹ abbreviated names:
#> #   ¹​`Incident Patient Care Report Number - PCR (eRecord.01)`,
#> #   ²​`Patient Age (ePatient.15)`
#> # ℹ 3 more variables: `Patient Age Units (ePatient.16)` <chr>,
#> #   `Patient Date Of Birth (ePatient.17)` <date>,
#> #   `Patient Gender (ePatient.13)` <chr>

These functions allow you to check column names, data types, and basic record structure. This step is essential because NEMSQA logic depends on specific fields. Incorrect data types (for example, character instead of numeric) will cause measure functions to fail.

Data types

The example datasets included in nemsqar already use appropriate data types. When working with real EMS data, you must verify these types manually. This ensures that the data satisfy validation requirements and prevents errors during measure calculation.

In practice, EMS data commonly include issues such as:

  • Dates stored as character strings
  • Numeric values stored as text (for example, "5" instead of 5)
  • Empty strings used in place of NA
  • Mixed formats within the same column

Below are examples of how to identify and correct these issues before running any nemsqar measure.

Example: Converting character dates to proper date formats

# Example: incident dates stored as character values
example_data <- data.frame(
  Incident_Date = c("2023-01-10", "01/12/2023", "20230114"),
  stringsAsFactors = FALSE
)

# Convert using lubridate (recommended)
example_data$Incident_Date <- lubridate::parse_date_time(
  example_data$Incident_Date,
  orders = c("ymd", "mdy", "Ymd")
)

Example: Converting numbers stored as character strings

numeric_example <- data.frame(
  # note that 45 here has whitespace surrounding the value
  Patient_Age_raw = c("34", "18", "07", " 45 ")
)

# Trim whitespace and convert to numeric
numeric_example$Patient_Age <- as.numeric(trimws(
  numeric_example$Patient_Age_raw
))

Example: Replacing empty strings with NA

missing_example <- data.frame(
  eSituation_11 = c("", "R41.82", "", "T14.90")
)

# Replace empty strings with NA
missing_example$eSituation_11 <- dplyr::na_if(missing_example$eSituation_11, "")

missing_example

Why these steps matter

Date fields are used for patient age computation and for time‑based denominators. If these values are stored as character strings or in inconsistent formats, the measure logic will not execute correctly.

Numeric fields such as age, blood pressure, respiratory rate, or dosage must be numeric to satisfy validation checks and ensure appropriate comparisons.

Empty strings cause false exclusions during population filtering, especially when nemsqar logic expects missing values to be formally represented as NA.

Ensuring correct data types prior to running any nemsqar function improves reproducibility, reduces debugging time, and allows the measure logic to operate as intended.

Dealing with problematic column names

If your datasets already have clean names, you may skip this step.

EMS registry data often contain column names with spaces, punctuation, or special characters. These can make programming in R more difficult. To avoid these issues, it is helpful to standardize column names before running any measures.

Below is a simple reusable function to clean column names by replacing spaces and special characters with underscores.

# Define a reusable column-cleaning function
clean_cols <- function(data) {
  data |>
    dplyr::rename_with(
      .cols = tidyselect::everything(),
      ~ . |>
        gsub(pattern = "\\.|\\(|-|\\s", replacement = "_") |>
        gsub(pattern = "_+", replacement = "_") |>
        gsub(pattern = "\\)", replacement = "")
    )
}

# Apply cleaning to each table
nemsqar_patient_scene_data <- nemsqar_patient_scene_table |> clean_cols()
nemsqar_response_data <- nemsqar_response_table |> clean_cols()
nemsqar_situation_data <- nemsqar_situation_table |> clean_cols()
nemsqar_medications_data <- nemsqar_medications_table |> clean_cols()

# Inspect the cleaned patient/scene table
dplyr::glimpse(nemsqar_patient_scene_data)
#> Rows: 10,000
#> Columns: 6
#> $ Incident_Patient_Care_Report_Number_PCR_eRecord_01 <chr> "NyXFBlJfnm-8333586…
#> $ Incident_Date                                      <date> 2023-12-20, 2023-0…
#> $ Patient_Age_ePatient_15                            <dbl> 98, 75, 24, 115, 54…
#> $ Patient_Age_Units_ePatient_16                      <chr> "Minutes", "Days", …
#> $ Patient_Date_Of_Birth_ePatient_17                  <date> 2023-12-19, 2023-0…
#> $ Patient_Gender_ePatient_13                         <chr> "Male to Female, Tr…

Now, special characters and whitespace are either removed or replaced with _ so R can more easily recognize the column names, and we can avoid annoying conventions to find column names.

Understanding Required Inputs

Each NEMSQA measure requires a specific set of input tables. Although nemsqar can accept a single combined dataset through the df argument, this approach is not recommended. The preferred workflow is to supply separate tables using the *_table arguments (for example, patient_scene_table, response_table). This aligns with the NEMSIS structure, where elements such as ePatient, eScene, eResponse, and eSituation are stored in distinct tables.

In practice, your data should follow this multi‑table structure:

Each measure expects a consistent set of these tables. For example:

The next sections demonstrate how to supply these inputs to nemsqar and how to calculate a measure.

Running the NEMSQA Measures Using nemsqar

Once the required tables are loaded, you can calculate your first measure. Each measure in nemsqar is implemented through a dedicated function that accepts NEMSIS‑aligned tables and returns standardized results.

nemsqar workhorse functions

Each measure is built using two core functions:

The wrapper function performs two main tasks. First, it calls the population function to identify the population of interest. Then it applies the measure logic to estimate performance. Each NEMSQA measure follows this same pattern.

Running the wrapper function for Asthma‑01

The asthma_01() function requires several NEMSIS‑aligned tables and column mappings. All arguments shown below are required. Each column argument identifies the specific NEMSIS field used by the measure logic. Note that most argument names signal the corresponding NEMSIS data element. For example, eresponse_05_col corresponds to eResponse.05 in the NEMSIS data dictionary.

To help you map your own data, the list below shows how several key arguments align with their corresponding NEMSIS elements:

  • erecord_01_col –> eRecord.01 (PCR number)
  • incident_date_col –> eTimes.03 (Unit Notified by Dispatch Date/Time)
  • patient_DOB_col –> ePatient.17 (patient date of birth)
  • epatient_15_col –> ePatient.15 (patient age)
  • epatient_16_col –> ePatient.16 (age units)
  • eresponse_05_col –> eResponse.05 (type of service requested)
  • esituation_11_col –> eSituation.11 (primary impression)
  • esituation_12_col –> eSituation.12 (secondary impression)
  • emedications_03_col –> eMedications.03 (medication administered)

These mappings ensure that each argument references the correct NEMSIS data element when running the measure.

# Run Asthma‑01 without grouping
asthma_01_all <- asthma_01(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03,
  confidence_interval = TRUE,
  method = "clopper-pearson",
  conf.level = 0.95
)


# print the results
asthma_01_all
#> # A tibble: 3 × 8
#>   measure   pop    numerator denominator  prop prop_label lower_ci upper_ci
#>   <chr>     <chr>      <int>       <int> <dbl> <chr>         <dbl>    <dbl>
#> 1 Asthma-01 Adults         0           4 0     0%           0         0.602
#> 2 Asthma-01 Peds           3          25 0.12  12%          0.0255    0.312
#> 3 Asthma-01 All            3          29 0.103 10.34%       0.0219    0.274

The output reflects the measure population, denominator, numerator, and final performance classification for each record. This structure is consistent across all NEMSQA measures implemented in nemsqar.

Running the asthma_01 wrapper function using grouping

nemsqar allows you to calculate a measure for the entire dataset or for specific subgroups. Grouping can be useful when you want to understand performance within meaningful categories, such as age groups, service types, or impressions. Grouping is implemented using the .by argument, which follows the same syntax used in dplyr::summarize().

The example below shows how to run Asthma‑01 grouped by age units. All required tables and column mappings remain the same; the only additional argument is .by.

# Run `asthma_01` for a whole dataset, group by age units.
# All core inputs remain the same. Only the .by argument is added.
asthma_01_age <- asthma_01(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03,
  confidence_interval = TRUE,
  method = "clopper-pearson",
  conf.level = 0.95,
  # notice here that we use the `.by` argument from `dplyr::summarize` to group
  # our analysis
  .by = Patient_Age_Units_ePatient_16
)

# print the results
asthma_01_age
#> # A tibble: 10 × 9
#>    Patient_Age_Units_ePa…¹ measure pop   numerator denominator   prop prop_label
#>    <chr>                   <chr>   <chr>     <int>       <int>  <dbl> <chr>     
#>  1 Years                   Asthma… Adul…         0           4 0      0%        
#>  2 Months                  Asthma… Peds          1          12 0.0833 8.33%     
#>  3 Minutes                 Asthma… Peds          2           9 0.222  22.22%    
#>  4 Hours                   Asthma… Peds          0           2 0      0%        
#>  5 Days                    Asthma… Peds          0           2 0      0%        
#>  6 Months                  Asthma… All           1          12 0.0833 8.33%     
#>  7 Minutes                 Asthma… All           2           9 0.222  22.22%    
#>  8 Hours                   Asthma… All           0           2 0      0%        
#>  9 Years                   Asthma… All           0           4 0      0%        
#> 10 Days                    Asthma… All           0           2 0      0%        
#> # ℹ abbreviated name: ¹​Patient_Age_Units_ePatient_16
#> # ℹ 2 more variables: lower_ci <dbl>, upper_ci <dbl>

Grouping is optional, and can reveal differences in performance across patient subpopulations and can be applied to any NEMSQA measure using the same .by syntax.

Working with the *_population() functions

Each NEMSQA measure includes a companion *_population() function. These functions identify the population of interest by applying the full set of inclusion and exclusion criteria defined by NEMSQA. They perform all filtering, validation, and intermediate computations needed to determine which records belong in the measure denominator.

Each population function returns a list containing several tibbles that help you examine the population:

These objects are useful when validating data quality, understanding how records flowed through the NEMSQA criteria, and troubleshooting unexpected measure results. In practice, population functions are most useful when you need to verify which records were included or excluded from the denominator and why. Analysts often use these functions when denominator counts look unexpected, when investigating data quality issues, or when comparing populations across systems or years. They provide a transparent view of how NEMSQA logic was applied to your data.

The example below demonstrates how to use asthma_01_population() to inspect the population identified for Asthma‑01.

Using asthma_01_population() to examine the target population

The asthma_01_population() function identifies the population of interest by applying all NEMSQA inclusion and exclusion criteria. The function uses the same required tables and column mappings as asthma_01(), but it does not calculate performance estimates and does not use confidence interval or grouping arguments

# Run `asthma_01_population` for a whole dataset
# The code is virtually the same as `asthma_01()`, but we do not use the
# confidence interval arguments, nor the tidy dot `...` arguments for grouping
# or other operations via `dplyr::summarize`
populations_asthma_01 <- asthma_01_population(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03
)

# print structure of the results using `base::summary()`
populations_asthma_01 |> summary()
#>                      Length Class  Mode
#> filter_process        2     tbl_df list
#> adults               16     tbl_df list
#> peds                 16     tbl_df list
#> initial_population   16     tbl_df list
#> computing_population 16     tbl_df list
#> missingness           6     tbl_df list

This output provides a structured view of how records were filtered through the NEMSQA criteria. It allows you to inspect the initial population, denominator‑eligible records, age‑specific subgroups, and missingness summaries for required fields.

Examine a summary of counts for the NEMSQA population

The *_population() functions return several tibbles that summarize how records were filtered into the final population of interest. One of the most useful is the filter_process tibble. It shows the number of records remaining after each inclusion or exclusion step defined by NEMSQA.

Asthma-01 summary of attributes of the target population

# Display counts for each filtering step
populations_asthma_01$filter_process
#> # A tibble: 7 × 2
#>   filter             count
#>   <chr>              <int>
#> 1 911 calls           2400
#> 2 Asthma cases         109
#> 3 Beta agonist cases  1482
#> 4 Adults denominator     4
#> 5 Peds denominator      25
#> 6 Initial population    29
#> 7 Total dataset      10000

filter_process is typically where analysts can look first when values seem off.

Using filter_process from the population functions

Given that this vignette uses synthetic data, the counts may not reflect realistic populations. However, the workflow remains the same when working with real EMS data. The values in filter_process represent distinct record counts at each stage (using dplyr::distinct() internally). Reviewing these counts, along with the missingness tibble returned by the population function, can help diagnose data quality issues and better understand the composition of the population being evaluated.

Common Pitfalls for New R Users

Users who are new to R often encounter several predictable issues when preparing data for NEMSQA measure calculation. The sections below highlight the most common problems and how to avoid them. Addressing these issues before running measures improves reproducibility and reduces debugging time.

Incorrect variable types

Many NEMSQA logic components require numeric fields. If these values are imported as character strings, the measure functions will fail. Always verify column types before running a measure and convert them as needed to meet nemsqar validation requirements.

Unintended name changes

Column names must align with the NEMSIS fields that each function argument represents. You may name your columns however you prefer, but the values that originate from eResponse.05 must be supplied to the eresponse_05_col argument. The function relies on the data itself, not the literal column name, but incorrect mapping will cause errors.

Missing required tables

Each measure requires a specific set of input tables. If a required table is not provided, the function will return an error. Ensure that all necessary tables are loaded and cleaned before running the measure.

Duplicated records

NEMSQA measures assume that each patient or encounter appears once in the relevant input tables. Duplicate rows can shift denominator counts, alter inclusion, or create unintended exclusions. Although nemsqar includes safeguards to detect some duplication, it is best practice to review your data for repeated records and to check for unintended Cartesian joins created during data extraction or table merging.

Next Steps

This vignette introduced the core workflow for calculating NEMSQA measures using nemsqar. After reviewing these examples, users may wish to expand their analyses by exploring additional measures, integrating their own EMS datasets, or incorporating these workflows into automated reporting pipelines. The package reference documentation provides detailed descriptions of each function, and additional vignettes will demonstrate multi‑measure workflows, validation strategies, and integration with reproducible reporting tools such as Quarto and R Markdown.