| Type: | Package | 
| Title: | Tools for Population Health Management Analytics | 
| Version: | 1.0.2 | 
| Maintainer: | Asif Laldin <laldin.asif@gmail.com> | 
| Description: | Created for population health analytics and monitoring. The functions in this package work best when working with patient level Master Patient Index-like datasets . Built to be used by NHS bodies and other health service providers. | 
| License: | AGPL (≥ 3) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.2 | 
| Imports: | ggplot2, dplyr, scales, janitor, readr, utils, ggthemes, magrittr, readxl, ggtext, DBI, odbc, rlang, tibble | 
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), kableExtra | 
| Config/testthat/edition: | 3 | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 2.10) | 
| Language: | en-gb | 
| NeedsCompilation: | no | 
| Packaged: | 2022-02-20 12:52:44 UTC; ald04 | 
| Author: | Asif Laldin [aut, cre],
  Gary Hutson | 
| Repository: | CRAN | 
| Date/Publication: | 2022-02-20 13:10:02 UTC | 
Pipe operator
See magrittr::%>% for details.
Description
Pipe operator
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
| lhs | A value or the magrittr placeholder. | 
| rhs | A function call using the magrittr semantics. | 
Value
The result of calling 'rhs(lhs)'.
PopHealthData - Population health data for testing functions
Description
Population Health NHS data to use with the package and allows the calculation of the various metrics.
Usage
PopHealthData
Format
A small dataset with 1000 observations (rows) and 8 columns, as described hereunder:
- Sex
- The identifiable sex of the patient 
- Smoker
- Indicates if the patient is a smoker 
- Diabetes
- Flag to indicate if patient has a type of diabetes 
- AgeBand
- The age of the patient when they came into contact with the service 
- IMD_Decile
- The decile of indices of multiple deprivation: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 
- Ethnicity
- The identifiable ethnicity of the patient 
- Locality
- The region where the patient lives - sampled from Gloucestershire Clinical Commissioning Group 
- PrimaryCareNetwork
- The primary care network of the patient 
Age Band Creation: Create a new column of 5 year Age Bands from an integer column
Description
Age Band Creation: Create a new column of 5 year Age Bands from an integer column
Usage
age_bandizer(df, Age_col)
Arguments
| df | a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData | 
| Age_col | a integer column within @param df NAs must be removed or imputed prior to running this function | 
Value
A dataframe with width ncol(df)+1, new column will be named Ageband and will be a factor with levels defined
Examples
library(SangerTools)
library(dplyr)
health_data <- SangerTools::PopHealthData
Create age bands from a numerical column
Description
An alternative age banding function that allows users greater flexibility for defining band size. This function utilises Base R standard evaluation. The function currently supports band size of 2, 5, 10 & 20. The input,column, Age_col should be numeric and must not contain NAs; if either of these conditions is violated the function will terminate.
Usage
age_bandizer_2(df, Age_col, Age_band_size = 5)
Arguments
| df | A dataframe with a numerical column denoting Age. | 
| Age_col | A numerical column within 'df'; passed with quotation marks. | 
| Age_band_size | The size of the Age band to use. Defaults to 5; will take values 2,5,10,20. | 
Value
A dataframe containing a new column 'Ageband' which has factor levels defined.
Examples
## Not run: 
library(SangerTools)
df <- data.frame(Age = sample(x = 0:120, size = 100, replace = TRUE))
df_agebanded <- age_bandizer_2(
  df = df,
  Age_col = "Age",
  Age_band = 5
)
print(df_agebanded)
## End(Not run)
Plot Counts of Categorical Variables
Description
Create a ggplot2 column chart of categorical variables with labels, in ascending order.
The plot will be customised using the provided theme theme_sanger, y-axis labels will have a comma for every third integer value.
If the column provided to 'grouping_var' has more than approximately 5 values, you may need to consider
rotating x axis labels using theme
A comprehensive explanation of ggplot2 customisation is available here
Usage
categorical_col_chart(df, grouping_var)
Arguments
| df | A dataframe with categorical variables | 
| grouping_var | a categorical variable by which to group the count by | 
Value
a ggplot2 object
Examples
library(SangerTools)
library(dplyr)
library(ggplot2)
# Group by Age Band
health_data <- SangerTools::PopHealthData
health_data %>%
  dplyr::filter(Smoker == 1) %>%
  SangerTools::categorical_col_chart(AgeBand) +
  labs(
    title = "Smoking Population by Age Band",
    subtitle = "Majority of Smokers are Working Aged ",
    x = NULL,
    y = "Patient Number"
  )
Patient Cohort Re-Identification Processing
Description
Population Health Management commonly leads practitioners to identify a cohort that will have an intervention applied. As a rule of thumb most analysts will work with pseudonymised data sets. For targeted interventions patients require re-identification; this process is generally carried out by a third party organisation. As third party organisations work with many health care providers they have a strict set of requirements. This has been based around SW CSU's required formatting.
Usage
cohort_processing(
  df,
  Split_by,
  path,
  prefix = "DSCRO",
  com_code = "11M",
  date_format = "%Y%m%d",
  suffix = "_REID_V01"
)
Arguments
| df | a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData. | 
| Split_by | A column within df that will be used to split the patients and will also appear in the file name. Ideally should be a health organisation code such as GP Practice Code or Hospital Trust Code. Should only have alpha-numeric values | 
| path | A file path to which the CSV files will be written | 
| prefix | File name prefix, default is "DSCRO" See more here: NHS DSCRO | 
| com_code | Commissioner Code, default is "11M"; Gloucestershire. | 
| date_format | A date format passed internally to 'format(Sys.Date())'; will form part of file name to denote date of generation. You can read more about date formatting in R from R lang | 
| suffix | A file name suffix, default is "_REID_V01", To be left as blank use "", without spaces. | 
Value
n number of CSV files written to the location specified by path argument.
Crude Prevalence Calculator
Description
Calculate the crude prevalence of a health condition from a Master Patient Index like dataset
Usage
crude_rates(df, Condition, ...)
Arguments
| df | a tidy dataframe in standard Master Patient Index format ie SangerTools::PopHealthData | 
| Condition | A Health condition flag denoted by 1 & 0; where 1 denotes the patient being positive for the health condition | 
| ... | Variables used to standardise by; Must always have Ageband, additional variables are optional | 
Value
a tibble with Crude Prevalence Rates(Rate per 1,000) for each value included in ...
Examples
library(SangerTools)
library(dplyr)
health_data <- SangerTools::PopHealthData
glimpse(health_data)
# Generate crude prevalene rate stats
crude_prevalence <- SangerTools::crude_rates(health_data, Diabetes, Locality)
print(crude_prevalence)
Dataframe to SQL
Description
DataFrame to SQL; Write your DataFrame or Tibble directly to SQL from R This wrapper function allows for the easy movement of your computed results in R to a SQL Database for saving. The function uses a ODBC driver to establish a connection. You will need to select a Database that your user has write-access to. The user credentials are the same as your OS login details; as such this function will most likely only work from you work computer.
Usage
df_to_sql(df, driver, server, database, sql_table_name, overwrite = FALSE, ...)
Arguments
| df | A 'dataFrame' or 'tibble' ie PopHealthData. | 
| driver | A driver for database ie "SQL Server"; must be passed in quotation. | 
| server | The unique name of your database server; must be passed in quotation. | 
| database | The name of the database to which you will write 'df'; must be passed in quotation. | 
| sql_table_name | The name that 'df' will be referred to in SQL database; must be passed in quotation. | 
| overwrite | If there is a SQL table with the same name whether it will be overwritten; defaults to FALSE. | 
| ... | Function forwarding for additional functionality. | 
Value
A message confirming that a new table has been created in a SQL 'database'.
Examples
## Not run: 
library(odbc)
library(DBI)
health_data <- SangerTools::PopHealthData
df_to_sql(
  df = health_data,
  driver = "SQL SERVER",
  database = "DATABASE",
  sql_table_name = "New Table Name",
  overwrite = FALSE
)
## End(Not run)
Dataframe or Tibble to Clipboard
Description
This function copies a data frame or tibble to your clipboard in a format that allows for a simple paste into excel whilst maintaining column and row structure. By default row_names has been set to FALSE.
Usage
excel_clip(df, row_names = FALSE, col_names = TRUE, ...)
Arguments
| df | A dataframe or tibble | 
| row_names | Set to FALSE for row.names not to be included | 
| col_names | Set to TRUE for col.names to be included | 
| ... | function forwarding for additional write.table functionality | 
Value
a data frame copied to your clipboard
Master Patient Index
Description
A fabricated Master Patient Index (MPI) inspired by Gloucestershire's population to be used with functions included in SangerTools
Usage
master_patient_index
Format
A tibble with 10,000 rows and 11 variables:
- PseudoNHSNumber
- A Pseudonymised NHS Patient Identifier 
- Sex
- The identifiable sex of the patient 
- Smoker
- Health Condition Flag: 1 denotes if the patient is a smoker 
- Diabetes
- Health Condition Flag: 1 denotes if the patient has diabetes 
- Dementia
- Health Condition Flag: 1 denotes if the patient has dementia 
- Obesity
- Health Condition Flag: 1 denotes if the patient is Obese 
- Age
- Age of the patient 
- IMD_Decile
- The decile of indices of multiple deprivation: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 
- Ethnicity
- The identifiable ethnicity of the patient 
- Locality
- The region where the patient lives - sampled from Gloucestershire Clinical Commissioning Group 
- PrimaryCareNetwork
- The network of General Practioners that the patient is registerd with - sampled from Gloucestershire Clinical Commissioning Group 
Source
Generated by Asif Laldin a.laldin@nhs.net, Feb-2022
Examples
library(dplyr)
data(master_patient_index)
# Convert diabetes data to factor'
master_patient_index %>%
 glimpse()
Read Multiple CSV files into R
Description
This function reads multiple CSVs in a directory must be same structure. This function reads multiple excel files into R after which all files are aggregated into a single data frame.
There are assumptions about they underlying files:
- All files must have column names for each column (The function will fail without this; later versions will amend this) 
- All files have the same number of columns 
- All files have the same column names 
- All files should have data starting from the same row number 
- All relevant data is stored in the same sheet in each of the files 
Usage
multiple_csv_reader(file_path, sheet = 1, rows_to_skip = 0, col_names = TRUE)
Arguments
| file_path | The Directory in which the files are located | 
| sheet | Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet | 
| rows_to_skip | The number of rows from the top to be excluded | 
| col_names | If columns are named; defaults to TRUE | 
Value
a data frame object full of file paths
Examples
library(SangerTools)
file_path <- "my_file_path_where_csvs_are_stored"
if (length(SangerTools::multiple_csv_reader(file_path)) == 0) {
  message("This won't work without changing the variable input to a local file path with CSVs in")
}
Read Multiple Excel files into R
Description
This function reads multiple excel files into R after which all files are aggregated into a single data frame.
There are assumptions about they underlying files:
- All files must have column names for each column (The function will fail without this; later versions will amend this) 
- All files have the same number of columns 
- All files have the same column names 
- All files should have data starting from the same row number 
- All relevant data is stored in the same sheet in each of the files 
To understand more about the underlying function that 'multiple_excel_reader' wraps around Click Here
Usage
multiple_excel_reader(
  file_path,
  pattern = "*.xlsx",
  sheet = 1,
  rows_to_skip = 0,
  col_names = TRUE
)
Arguments
| file_path | The Directory in which the files are located | 
| pattern | The file extension of the files of which you are going to read. Defaults to "*.xlsx" | 
| sheet | Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet | 
| rows_to_skip | The number of rows from the top to be excluded | 
| col_names | A boolean value to determine if column headers name are present in files. Currently only accepts TRUE | 
Value
a data frame object full of file paths
Examples
## Not run: 
combined_excel_files <- multiple_excel_reader("Inputs/", 1, TRUE)
## End(Not run)
Branded discrete colour scale
Description
This anonymous function allows you to apply the Sanger Theme colours to your ggplot2 plot
Usage
scale_fill_sanger()
Value
A custom colour filled ggplot2 plot
Examples
library(SangerTools)
library(dplyr)
library(ggplot2)
# Group by Age Band
health_data <- SangerTools::PopHealthData
health_data %>%
  dplyr::filter(Smoker == 1) %>%
  SangerTools::categorical_col_chart(AgeBand) +
  labs(
    title = "Smoking Population by Age Band",
    subtitle = "Majority of Smokers are Working Aged ",
    x = NULL,
    y = "Patient Number"
  )+
  scale_fill_sanger()
Brand Colour Palette
Description
Displays a brand colour palette for showing the hex codes associated with brand
Usage
show_brand_palette()
Value
a Base R plot object
Examples
library(scales)
library(SangerTools)
show_brand_palette()
Extended Brand Colour Palette
Description
Displays extended brand colour palette for charting
Usage
show_extended_palette()
Value
a Base R plot object
Examples
library(scales)
library(SangerTools)
show_extended_palette()
Split & Save
Description
A simpler alternative to  cohort_processing. Will split a data frame
and save as a csv
Usage
split_and_save(df, Split_by, path, prefix = NULL)
Arguments
| df | A 'dataFrame' or 'tibble' ie PopHealthData. | 
| Split_by | A column within df that will be used to split the patients and will also appear in the file name. Ideally should be a health organisation code such as GP Practice Code or Hospital Trust Code. Should only have alpha-numeric values | 
| path | A file path to which the CSV files will be written | 
| prefix | File name prefix | 
Value
n number of CSV files written to the location specified by path argument.
Examples
 ## Not run: 
split_and_save(
 df = pseudo_data,
 Split_by = "Locality",
 file_path = "Inputs/",
 prefix = NULL
)
## End(Not run)
Standardised Prevalence Rates.
Description
Standardisation will be  performed
for all unique values in the column passed to 'split_by'. If input data frame does not contain age bands
or age bands are not of class factor, it is recommended to use age_bandizer or age_bandizer_2.
After the function has run, the output can be copied using excel_clip  or  written to a database using df_to_sql.
Alternatively, if you are interested in seeing the effects of age confounding; consider joining the outputs of this function with the output from crude_rates
using a left_join
Usage
standardised_rates_df(
  df,
  Split_by,
  Condition,
  Population_Standard,
  Granular = FALSE,
  ...
)
Arguments
| df | a tidy data frame in standard Master Patient Index format ie SangerTools::PopHealthData. | 
| Split_by | A column name within df for which the standardised rates will be calculated for. | 
| Condition | A Health condition flag denoted by 1 & 0; where 1 denotes the patient being positive for the health condition. | 
| Population_Standard | Population Standard Weight used for Standardising; default set to NULL; which denotes use of Age Structure of df. | 
| Granular | Takes a boolean value. If set to TRUE will output a tibble with Standardised Rates using values provided in 'Split_col' and '...'By default is set to FALSE. | 
| ... | Variables used to standardise by; Must always have Age band for age standardisation, additional variables are optional and should be passed separated by commas. | 
Value
A tibble containing standardised Prevalence Rates by specified group.
Examples
library(SangerTools)
health_data <- SangerTools::age_bandizer(df = SangerTools::master_patient_index,
                                         Age_col=Age)
df_rates <- standardised_rates_df(
  df = health_data,
  Split_by = Locality,
  Condition = Diabetes,
  Population_Standard = NULL,
  Granular = TRUE,
  Ageband
)
print(df_rates)
Customised ggplot2 Theme
Description
A customised ggplot2 theme for the SangerTools package
Usage
theme_sanger()
Value
A customised ggplot2 plot
Examples
library(SangerTools)
library(ggthemes)
library(ggplot2)
library(ggtext)
categorical_col_chart(SangerTools::PopHealthData, Locality) +
  theme_sanger()+
  labs(title = "Categorical Column Chart",
  x = "Locality",
  y = "Number of Patients")+
  scale_fill_sanger()
Data set of 2018 UK Population
Description
Data is taken from ONS and is split into 5 year age band
Usage
uk_pop_standard
Format
A tibble with 29 rows and 2 variables:
- UK_Population
- dbl Year price was recorded 
- Ageband
- 5 Year age band for population 
Source
https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates