---
title: "Getting Started with nemsqar"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with nemsqar}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  eval = TRUE,
  echo = TRUE,
  warning = FALSE,
  message = FALSE,
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, echo=FALSE, message=FALSE, results="hide"}
library(nemsqar)
```

# Introduction

The `nemsqar` package provides an automated and reproducible framework for calculating EMS quality measures defined by the National EMS Quality Alliance (NEMSQA). These measures are widely used by EMS agencies, trauma systems, quality improvement teams, and researchers to evaluate performance and support evidence‑based improvement activities.

This vignette is written for users who are knowledgeable in EMS, injury epidemiology, or quality improvement, but who may be new to R or new to calculating NEMSQA measures using R. The focus is to guide you through each step of the workflow: loading data, preparing it for use with `nemsqar`, and running a selected NEMSQA measure.

By the end of this vignette, you will understand:

- The structure and purpose of the example datasets included in `nemsqar`  
- How to load NEMSQA‑ready tables using synthetic data packaged with `nemsqar`  
- How to prepare EMS data for measure calculation  
- How to run individual NEMSQA measure functions  
- How to interpret results generated by the package  

The sections that follow walk through the entire process, from loading EMS data in R to producing a standardized performance measure aligned with national reporting expectations.

# The Measures

This vignette focuses on one NEMSQA measure implemented in `nemsqar`:

- **Asthma‑01**: Assessment and treatment of patients with suspected asthma  

All measures in `nemsqar` follow the same basic structure. Each one requires a defined set of NEMSIS‑aligned tables and returns results in a standardized format. The following sections introduce the data required to calculate this measure and demonstrate how to run it in R.

# The Data

Before calculating any measure, it is important to understand the example datasets included with `nemsqar`. These datasets are small, synthetic representations of NEMSIS tables. They provide a safe environment for learning the workflow before applying it to production EMS data. Because NEMSQA measure logic relies on the NEMSIS data structure, each measure requires several tables (for example, patient, response, situation, medications, vitals).

## Loading Example Data

The `nemsqar` package includes synthetic datasets that mirror the NEMSIS tables required for NEMSQA measure calculation. You can load them with the `data()` function. For Asthma‑01, the following tables are required:

```{r asthma_01_tables, results="hide"}
data("nemsqar_patient_scene_table")
data("nemsqar_response_table")
data("nemsqar_situation_table")
data("nemsqar_medications_table")
```

Each dataset loads into your R environment as a standard data frame.

### Loading your own data
In practice, you will typically load your own EMS datasets. These may come from CSV files, databases, or data extracts. Below are common patterns for loading local files:

```{r alt_data_load, eval=FALSE, echo=TRUE, results="hide"}
# Store your file path, preferably in .Renviron files
path <- file.path("some_path_to_file")

# Load data from the path using the tidyverse package `readr`
data <- readr::read_csv(path)

# Load data from the path using base R
data <- read.csv(file = path)
```

## Inspecting Table Structure

Because many users are new to R, it is important to verify that each dataset loaded correctly and contains the variables required by the measure functions. A few simple commands help confirm this.

### Review the structure of a loaded dataset
```{r glimpse_patient_scene_table}
# Quick overview of column names and data types
dplyr::glimpse(nemsqar_patient_scene_table)
```

### View the first few rows
```{r head_patient_scene_table}
# An abbreviated look at the actual data tables
head(nemsqar_patient_scene_table, n = 10)
```

These functions allow you to check column names, data types, and basic record structure. This step is essential because NEMSQA logic depends on specific fields. Incorrect data types (for example, character instead of numeric) will cause measure functions to fail.

### Data types

The example datasets included in `nemsqar` already use appropriate data types. When working with real EMS data, you must verify these types manually. This ensures that the data satisfy validation requirements and prevents errors during measure calculation.

In practice, EMS data commonly include issues such as:

- Dates stored as character strings  
- Numeric values stored as text (for example, `"5"` instead of `5`)  
- Empty strings used in place of `NA`  
- Mixed formats within the same column  

Below are examples of how to identify and correct these issues before running any `nemsqar` measure.

#### Example: Converting character dates to proper date formats

```{r fix_dates_examples, eval=FALSE, echo=TRUE, results="hide"}
# Example: incident dates stored as character values
example_data <- data.frame(
  Incident_Date = c("2023-01-10", "01/12/2023", "20230114"),
  stringsAsFactors = FALSE
)

# Convert using lubridate (recommended)
example_data$Incident_Date <- lubridate::parse_date_time(
  example_data$Incident_Date,
  orders = c("ymd", "mdy", "Ymd")
)
```

#### Example: Converting numbers stored as character strings

```{r fix_numeric_examples, eval=FALSE, echo=TRUE, results="hide"}
numeric_example <- data.frame(
  # note that 45 here has whitespace surrounding the value
  Patient_Age_raw = c("34", "18", "07", " 45 ")
)

# Trim whitespace and convert to numeric
numeric_example$Patient_Age <- as.numeric(trimws(
  numeric_example$Patient_Age_raw
))
```

#### Example: Replacing empty strings with NA

```{r fix_missing_examples, eval=FALSE, echo=TRUE, results="hide"}
missing_example <- data.frame(
  eSituation_11 = c("", "R41.82", "", "T14.90")
)

# Replace empty strings with NA
missing_example$eSituation_11 <- dplyr::na_if(missing_example$eSituation_11, "")

missing_example
```

#### Why these steps matter

Date fields are used for patient age computation and for time‑based denominators. If these values are stored as character strings or in inconsistent formats, the measure logic will not execute correctly.

Numeric fields such as age, blood pressure, respiratory rate, or dosage must be numeric to satisfy validation checks and ensure appropriate comparisons.

Empty strings cause false exclusions during population filtering, especially when `nemsqar` logic expects missing values to be formally represented as NA.

Ensuring correct data types prior to running any nemsqar function improves reproducibility, reduces debugging time, and allows the measure logic to operate as intended.

### Dealing with problematic column names

If your datasets already have clean names, you may skip this step.

EMS registry data often contain column names with spaces, punctuation, or special characters. These can make programming in R more difficult. To avoid these issues, it is helpful to standardize column names before running any measures.

Below is a simple reusable function to clean column names by replacing spaces and special characters with underscores.

```{r clean_column_names, echo=TRUE, eval=TRUE}
# Define a reusable column-cleaning function
clean_cols <- function(data) {
  data |>
    dplyr::rename_with(
      .cols = tidyselect::everything(),
      ~ . |>
        gsub(pattern = "\\.|\\(|-|\\s", replacement = "_") |>
        gsub(pattern = "_+", replacement = "_") |>
        gsub(pattern = "\\)", replacement = "")
    )
}

# Apply cleaning to each table
nemsqar_patient_scene_data <- nemsqar_patient_scene_table |> clean_cols()
nemsqar_response_data <- nemsqar_response_table |> clean_cols()
nemsqar_situation_data <- nemsqar_situation_table |> clean_cols()
nemsqar_medications_data <- nemsqar_medications_table |> clean_cols()

# Inspect the cleaned patient/scene table
dplyr::glimpse(nemsqar_patient_scene_data)
```

Now, special characters and whitespace are either removed or replaced with `_` so R can more easily recognize the column names, and we can avoid annoying conventions to find column names.

## Understanding Required Inputs

Each NEMSQA measure requires a specific set of input tables. Although `nemsqar` can accept a single combined dataset through the `df` argument, this approach is not recommended. The preferred workflow is to supply separate tables using the `*_table` arguments (for example, `patient_scene_table`, `response_table`). This aligns with the NEMSIS structure, where elements such as ePatient, eScene, eResponse, and eSituation are stored in distinct tables.

In practice, your data should follow this multi‑table structure:

- Patient and scene data stored together (1:1)
- Response data stored separately
- Situation data stored separately
- Medications and vitals stored in their respective tables

Each measure expects a consistent set of these tables. For example:

- **Asthma‑01** requires patient, response, situation, and medication tables.  

The next sections demonstrate how to supply these inputs to `nemsqar` and how to calculate a measure.

# Running the NEMSQA Measures Using `nemsqar`

Once the required tables are loaded, you can calculate your first measure. Each measure in `nemsqar` is implemented through a dedicated function that accepts NEMSIS‑aligned tables and returns standardized results.

## `nemsqar` workhorse functions

Each measure is built using two core functions:

- A **wrapper function** named `measure_##()` (for example, `asthma_01()`)
- A corresponding **population function**, such as `asthma_01_population()`

The wrapper function performs two main tasks. First, it calls the population function to identify the population of interest. Then it applies the measure logic to estimate performance. Each NEMSQA measure follows this same pattern.

### Running the wrapper function for Asthma‑01

The `asthma_01()` function requires several NEMSIS‑aligned tables and column mappings. All arguments shown below are required. Each column argument identifies the specific NEMSIS field used by the measure logic. Note that most argument names signal the corresponding NEMSIS data element. For example, `eresponse_05_col` corresponds to eResponse.05 in the NEMSIS data dictionary.

To help you map your own data, the list below shows how several key arguments align with their corresponding NEMSIS elements:

- `erecord_01_col` --> eRecord.01 (PCR number)  
- `incident_date_col` --> eTimes.03 (Unit Notified by Dispatch Date/Time)
- `patient_DOB_col` --> ePatient.17 (patient date of birth)  
- `epatient_15_col` --> ePatient.15 (patient age)  
- `epatient_16_col` --> ePatient.16 (age units)  
- `eresponse_05_col` --> eResponse.05 (type of service requested)  
- `esituation_11_col` --> eSituation.11 (primary impression)  
- `esituation_12_col` --> eSituation.12 (secondary impression)  
- `emedications_03_col` --> eMedications.03 (medication administered)

These mappings ensure that each argument references the correct NEMSIS data element when running the measure.

```{r run_asthma_01, results="show"}
# Run Asthma‑01 without grouping
asthma_01_all <- asthma_01(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03,
  confidence_interval = TRUE,
  method = "clopper-pearson",
  conf.level = 0.95
)


# print the results
asthma_01_all
```

The output reflects the measure population, denominator, numerator, and final performance classification for each record. This structure is consistent across all NEMSQA measures implemented in `nemsqar`.

### Running the `asthma_01` wrapper function using grouping

`nemsqar` allows you to calculate a measure for the entire dataset or for specific subgroups. Grouping can be useful when you want to understand performance within meaningful categories, such as age groups, service types, or impressions. Grouping is implemented using the `.by` argument, which follows the same syntax used in `dplyr::summarize()`.

The example below shows how to run **Asthma‑01** grouped by age units. All required tables and column mappings remain the same; the only additional argument is `.by`.

```{r run_asthma_01_age, results="show"}
# Run `asthma_01` for a whole dataset, group by age units.
# All core inputs remain the same. Only the .by argument is added.
asthma_01_age <- asthma_01(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03,
  confidence_interval = TRUE,
  method = "clopper-pearson",
  conf.level = 0.95,
  # notice here that we use the `.by` argument from `dplyr::summarize` to group
  # our analysis
  .by = Patient_Age_Units_ePatient_16
)

# print the results
asthma_01_age
```

Grouping is optional, and can reveal differences in performance across patient subpopulations and can be applied to any NEMSQA measure using the same `.by` syntax. 

## Working with the `*_population()` functions

Each NEMSQA measure includes a companion `*_population()` function. These functions identify the population of interest by applying the full set of inclusion and exclusion criteria defined by NEMSQA. They perform all filtering, validation, and intermediate computations needed to determine which records belong in the measure denominator.

Each population function returns a `list` containing several tibbles that help you examine the population:

- A tibble with counts for each filtering step  
- Tibbles for specific populations (for example, adult, pediatric, or all patients)  
- A tibble showing the initial population before any filtering  
- A tibble with the full dataset and computed fields  
- A tibble summarizing missingness for required columns across all tables  

These objects are useful when validating data quality, understanding how records flowed through the NEMSQA criteria, and troubleshooting unexpected measure results. In practice, population functions are most useful when you need to verify which records were included or excluded from the denominator and why. Analysts often use these functions when denominator counts look unexpected, when investigating data quality issues, or when comparing populations across systems or years. They provide a transparent view of how NEMSQA logic was applied to your data.

The example below demonstrates how to use `asthma_01_population()` to inspect the population identified for Asthma‑01.

### Using `asthma_01_population()` to examine the target population

The `asthma_01_population()` function identifies the population of interest by applying all NEMSQA inclusion and exclusion criteria. The function uses the same required tables and column mappings as `asthma_01()`, but it does not calculate performance estimates and does not use confidence interval or grouping arguments

```{r population_asthma_01, results="show"}
# Run `asthma_01_population` for a whole dataset
# The code is virtually the same as `asthma_01()`, but we do not use the
# confidence interval arguments, nor the tidy dot `...` arguments for grouping
# or other operations via `dplyr::summarize`
populations_asthma_01 <- asthma_01_population(
  patient_scene_table = nemsqar_patient_scene_data,
  response_table = nemsqar_response_data,
  situation_table = nemsqar_situation_data,
  medications_table = nemsqar_medications_data,
  erecord_01_col = Incident_Patient_Care_Report_Number_PCR_eRecord_01,
  incident_date_col = Incident_Date,
  patient_DOB_col = Patient_Date_Of_Birth_ePatient_17,
  epatient_15_col = Patient_Age_ePatient_15,
  epatient_16_col = Patient_Age_Units_ePatient_16,
  eresponse_05_col = Response_Type_Of_Service_Requested_With_Code_eResponse_05,
  esituation_11_col = Situation_Provider_Primary_Impression_Code_And_Description_eSituation_11,
  esituation_12_col = Situation_Provider_Secondary_Impression_Description_And_Code_List_eSituation_12,
  emedications_03_col = Patient_Medication_Given_or_Administered_Description_And_RXCUI_Codes_List_eMedications_03
)

# print structure of the results using `base::summary()`
populations_asthma_01 |> summary()
```

This output provides a structured view of how records were filtered through the NEMSQA criteria. It allows you to inspect the initial population, denominator‑eligible records, age‑specific subgroups, and missingness summaries for required fields.

### Examine a summary of counts for the NEMSQA population

The `*_population()` functions return several tibbles that summarize how records were filtered into the final population of interest. One of the most useful is the `filter_process` tibble. It shows the number of records remaining after each inclusion or exclusion step defined by NEMSQA.

#### Asthma-01 summary of attributes of the target population
```{r examine_filter_process}
# Display counts for each filtering step
populations_asthma_01$filter_process
```

`filter_process` is typically where analysts can look first when values seem off.

### Using filter_process from the population functions

Given that this vignette uses synthetic data, the counts may not reflect realistic populations. However, the workflow remains the same when working with real EMS data. The values in filter_process represent distinct record counts at each stage (using `dplyr::distinct()` internally). Reviewing these counts, along with the missingness tibble returned by the population function, can help diagnose data quality issues and better understand the composition of the population being evaluated.


# Common Pitfalls for New R Users

Users who are new to R often encounter several predictable issues when preparing data for NEMSQA measure calculation. The sections below highlight the most common problems and how to avoid them. Addressing these issues before running measures improves reproducibility and reduces debugging time.

## Incorrect variable types

Many NEMSQA logic components require numeric fields. If these values are imported as character strings, the measure functions will fail. Always verify column types before running a measure and convert them as needed to meet `nemsqar` validation requirements.

## Unintended name changes

Column names must align with the NEMSIS fields that each function argument represents. You may name your columns however you prefer, but the values that originate from **eResponse.05** must be supplied to the `eresponse_05_col` argument. The function relies on the data itself, not the literal column name, but incorrect mapping will cause errors.

## Missing required tables

Each measure requires a specific set of input tables. If a required table is not provided, the function will return an error. Ensure that all necessary tables are loaded and cleaned before running the measure.

## Duplicated records

NEMSQA measures assume that each patient or encounter appears once in the relevant input tables. Duplicate rows can shift denominator counts, alter inclusion, or create unintended exclusions. Although `nemsqar` includes safeguards to detect some duplication, it is best practice to review your data for repeated records and to check for unintended Cartesian joins created during data extraction or table merging.

# Next Steps

This vignette introduced the core workflow for calculating NEMSQA measures using `nemsqar`. After reviewing these examples, users may wish to expand their analyses by exploring additional measures, integrating their own EMS datasets, or incorporating these workflows into automated reporting pipelines. The package reference documentation provides detailed descriptions of each function, and additional vignettes will demonstrate multi‑measure workflows, validation strategies, and integration with reproducible reporting tools such as Quarto and R Markdown.