| Title: | Analyse WHO STEPS Survey Data |
| Version: | 0.1.0 |
| Description: | Provides a complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) as described in Riley et al. (2016) <doi:10.2105/AJPH.2015.302962>. Imports raw survey data ('CSV', 'Excel', 'Stata', 'SPSS'), applies WHO-standard cleaning and recoding, sets up complex survey designs, computes all standard NCD indicators (tobacco, alcohol, diet, physical activity, anthropometry, blood pressure, biochemical), and generates publication-ready tables, visualisations, and 'Word'/'HTML' reports (fact sheet, data book, country report). |
| License: | MIT + file LICENSE |
| URL: | https://github.com/drpakhare/stepssurvey |
| BugReports: | https://github.com/drpakhare/stepssurvey/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | bslib, dplyr, DT, flextable, ggplot2, glue, haven, janitor, patchwork, purrr, readr, readxl, rmarkdown, shiny, survey, tools |
| Suggests: | knitr, remotes, testthat (≥ 3.0.0), withr |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-06 08:20:10 UTC; drpakhare |
| Author: | Abhijit Pakhare [aut, cre], Ankur Joshi [aut], Lena Charlette [aut], WHO STEPS R Pipeline Contributors [ctb] |
| Maintainer: | Abhijit Pakhare <abhijit.cfm@aiimsbhopal.edu.in> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-11 19:10:03 UTC |
stepssurvey: Analyse WHO STEPS Survey Data
Description
A complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS).
Author(s)
Maintainer: Abhijit Pakhare abhijit.cfm@aiimsbhopal.edu.in
Authors:
Ankur Joshi
Lena Charlette
Other contributors:
WHO STEPS R Pipeline Contributors [contributor]
See Also
Useful links:
Report bugs at https://github.com/drpakhare/stepssurvey/issues
Optionally subset a design by a denominator variable
Description
Optionally subset a design by a denominator variable
Usage
.apply_denominator(design, denominator, data)
Build a cascade table (also used for combined risk factors)
Description
Build a cascade table (also used for combined risk factors)
Usage
.build_cascade_table(result)
Build a category table
Description
Build a category table
Usage
.build_category_table(result)
Build a mean table
Description
Build a mean table
Usage
.build_mean_table(result)
Build a proportion table
Description
Build a proportion table
Usage
.build_proportion_table(result)
Get the SPSS/haven label for a column (if any)
Description
Get the SPSS/haven label for a column (if any)
Usage
.col_label(col)
Arguments
col |
A vector (typically from a data frame column). |
Value
Character string of the label, or "" if none.
Compute a cascade table (type = "cascade")
Description
A cascade is a series of proportions, each possibly with a different denominator (nested). Returns named list, each with total, by_sex, by_age.
Usage
.compute_cascade(entry, design, data)
Compute a category table (type = "category")
Description
The variable is a factor; compute the proportion in each level. Returns a named list (one element per level), each with total, by_sex, by_age.
Usage
.compute_category(entry, design, data)
Compute combined risk factors (type = "combined")
Description
Counts how many of 5 key risk factors each respondent has, then computes the proportion with 0, 1-2, and 3-5 risk factors.
Usage
.compute_combined(entry, design, data)
Compute a mean table (type = "mean")
Description
For single variable: total, by_sex, by_age. For multiple variables: a named list, each with total, by_sex, by_age.
Usage
.compute_mean(entry, design, data)
Compute a proportion table (type = "proportion")
Description
Produces total, by_sex, by_age data frames with estimate, lower, upper. Values are in percentage (0-100).
Usage
.compute_proportion(entry, design, data)
Label-aware column detection for ambiguous STEPS codes
Description
Some STEPS column codes (e.g., A1, A5) mean different things across instrument versions. This function first tries unambiguous candidate names, then checks ambiguous codes only if their column label matches expected keywords.
Usage
.detect_col_with_label(data, safe_candidates, ambiguous = list(), label = NULL)
Arguments
data |
A data frame. |
safe_candidates |
Character vector of unambiguous column names. |
ambiguous |
Named list: each element is a character vector of
keywords that must appear in the column label for the code to match.
Names are the ambiguous column codes.
Example: |
label |
Optional label for progress messages. |
Value
The matched column name or NULL.
Format a single estimate with CI
Description
Format a single estimate with CI
Usage
.fmt_ci(est, lower, upper, digits = 1, is_pct = TRUE)
Arguments
est |
Numeric estimate. |
lower |
Lower CI bound. |
upper |
Upper CI bound. |
digits |
Decimal places. |
is_pct |
If TRUE, values are already in percent; add "%" suffix. |
Value
Formatted string.
Create a single survey design for a given weight column
Description
Internal helper that builds one survey::svydesign object.
Usage
.make_design(data, wt_col, label = "")
Arguments
data |
A data frame (typically from |
wt_col |
Character name of the weight column to use. |
label |
Label for messages (e.g. "Step 1"). |
Value
A survey::svydesign object.
Merge by_sex and by_age results into a single wide data frame
Description
Produces the standard 3-panel layout: Age Group | Men | Women | Both Sexes.
Usage
.merge_panels(
total_df,
by_sex_df,
by_age_df,
is_pct = TRUE,
by_sex_age_df = NULL
)
Arguments
total_df |
Total estimate (1-row data frame). |
by_sex_df |
By-sex estimates (with |
by_age_df |
By-age estimates (with |
is_pct |
Whether values are percentages. |
Value
A data frame with columns: Age Group, Men, Women, Both Sexes.
Strip haven-labelled classes from all columns in a data frame
Description
Converts haven_labelled vectors to their base R types.
This prevents downstream errors (e.g., dim<-.haven_labelled() not supported) when passing the data to DT, dplyr, or survey.
Usage
.strip_haven_labels(data)
Arguments
data |
A data frame (typically from |
Value
The same data frame with all haven labels removed.
Apply WHO STEPS styling to a flextable
Description
Apply WHO STEPS styling to a flextable
Usage
.style_steps_table(ft, title, description = NULL)
Arguments
ft |
A flextable object. |
title |
Table title. |
description |
Optional description text. |
Value
Styled flextable.
Check if variable(s) exist in data
Description
Check if variable(s) exist in data
Usage
.vars_available(vars, data)
Main Shiny Server
Description
Wires up all module servers and passes reactive outputs between them.
Usage
app_server(input, output, session)
Arguments
input, output, session |
Standard Shiny server arguments. |
Main Shiny UI
Description
Builds the top-level UI using bslib sidebar page.
Usage
app_ui()
Value
A Shiny UI definition.
Build all tables from computed results
Description
Build all tables from computed results
Usage
build_all_tables(results)
Arguments
results |
A named list of results from |
Value
A named list of flextable objects. NULL entries excluded.
Build forest plot of key indicators with 95% CIs
Description
Creates a horizontal point-and-CI plot (forest plot style) for all key indicators, grouped by domain.
Usage
build_forest_plot(key_indicators, country_name, survey_year)
Arguments
key_indicators |
A data frame with domain, indicator, estimate, lower, upper. |
country_name |
Country name for title. |
survey_year |
Survey year for title. |
Value
A ggplot2 object.
Build radar / spider chart of NCD risk factor profile
Description
Creates a radar-style chart showing prevalence of key risk factors on a polar coordinate system for quick visual comparison.
Usage
build_radar_plot(key_indicators, country_name, survey_year)
Arguments
key_indicators |
A data frame with domain, indicator, estimate. |
country_name |
Country name for title. |
survey_year |
Survey year for title. |
Value
A ggplot2 object.
Build publication-ready STEPS visualizations
Description
Generates a list of ggplot2 plots showing key NCD risk factor prevalence with 95% confidence intervals, stratified by sex and age group.
Usage
build_steps_plots(indicators, key_indicators, country_name, survey_year)
Arguments
indicators |
A list of indicator results from |
key_indicators |
A data frame with key indicators (domain, indicator, estimate, lower, upper). |
country_name |
Country name for plot titles. |
survey_year |
Survey year for plot titles. |
Details
All plots use the WHO STEPS colour scheme and professional styling. Error bars represent 95% confidence intervals. Prevalence values are displayed on bars/points with light background text.
Value
A named list of ggplot2 objects:
-
overview: Horizontal bar chart of key indicators -
tobacco_by_sex: Sex-stratified tobacco use -
bp_by_sex: Sex-stratified blood pressure -
obesity_by_sex: Sex-stratified overweight/obesity -
glucose_by_sex: Sex-stratified blood glucose -
bp_by_age: Age-stratified blood pressure with ribbon CI -
obesity_by_age: Age-stratified overweight/obesity with ribbon CI -
sex_dashboard: Combined 2x2 dashboard of sex-stratified charts (if >=2 sex plots available) NULL entries are preserved in the list.
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
all_ind <- compute_all_indicators(design)
plots <- build_steps_plots(all_ind$results, all_ind$key_indicators, "Test", 2023)
names(plots)
Build survey-weighted tables for STEPS indicators
Description
Generates formatted flextable objects for all available STEPS indicators, with rows for age groups and columns for both sexes combined, males, and females. Tables include 95% confidence intervals.
Usage
build_steps_tables(indicators)
Arguments
indicators |
A list of indicator results from |
Details
Each table has age groups as rows and prevalence (with 95% CI) as a column. The last row shows the total (age-standardised) estimate. Column header styling uses WHO STEPS branding (dark blue background).
Value
A named list of flextable objects, one per indicator.
Names correspond to indicators (e.g., current_tobacco, raised_bp).
NULL entries are excluded. Prints count of tables generated.
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
all_ind <- compute_all_indicators(design)
tables <- build_steps_tables(all_ind$results)
names(tables)
Build a formatted table from a computed result
Description
Dispatches to the appropriate formatting method based on table type.
Usage
build_table(result)
Arguments
result |
A result list from |
Value
A flextable object, or NULL if the table is not available.
Build a table if indicator is available
Description
Wrapper that safely builds a table only if the required indicator elements exist. Catches and reports errors gracefully.
Usage
build_table_if_available(
ind_list,
total_key,
by_sex_key,
by_age_key,
label,
pct = TRUE
)
Arguments
ind_list |
Indicator list (e.g., |
total_key |
Name of total estimate element (e.g., "current_tobacco_total"). |
by_sex_key |
Name of by-sex element (e.g., "current_tobacco_by_sex"). |
by_age_key |
Name of by-age element (e.g., "current_tobacco_by_age"). |
label |
Table caption. |
pct |
Logical; format as percent? (default TRUE). |
Value
A flextable object or NULL if not available.
Clean and recode WHO STEPS data
Description
Processes raw STEPS survey data: renames columns, coerces types, derives standard indicators, handles missing values, and applies plausibility checks.
Usage
clean_steps_data(
data,
cols,
age_min = 18,
age_max = 69,
bp_sbp_threshold = 140,
bp_dbp_threshold = 90,
bmi_overweight = 25,
bmi_obese = 30,
glucose_threshold = 7,
glucose_impaired_threshold = 6.1,
chol_threshold = 5
)
Arguments
data |
A data frame (typically from |
cols |
A named list of column names, as returned by |
age_min |
Minimum age for inclusion (default 18). |
age_max |
Maximum age for inclusion (default 69). |
bp_sbp_threshold |
SBP threshold for raised BP (default 140; Mongolia uses 130). |
bp_dbp_threshold |
DBP threshold for raised BP (default 90; Mongolia uses 80). |
bmi_overweight |
BMI threshold for overweight (default 25.0). |
bmi_obese |
BMI threshold for obesity (default 30.0). |
glucose_threshold |
Fasting glucose threshold for raised glucose / diabetes in mmol/L (default 7.0). |
glucose_impaired_threshold |
Fasting glucose threshold for impaired fasting glucose in mmol/L (default 6.1). |
chol_threshold |
Total cholesterol threshold for raised cholesterol in mmol/L (default 5.0). |
Details
The function performs the following transformations:
Renames columns to standard names (age, sex, wt_final, etc.)
Converts numeric strings to appropriate types
Restricts age to
[age_min, age_max]Creates WHO standard age groups (18-24, 25-34, etc.)
Harmonises sex coding to Male/Female
Derives body mass index (BMI) and categories
Averages blood pressure readings (last 2 of 3)
Recodes yes/no variables to logical
Creates derived risk indicators (raised BP, diabetes, etc.)
Applies plausibility checks to measurements
Drops records with missing age or sex
Value
A data frame with standardised and derived variables, ready for survey design setup.
Compute Alcohol Use Indicators
Description
Calculates prevalence of alcohol use from a survey design object. Computes proportions of current alcohol use and heavy episodic drinking, stratified by sex and age group where available.
Usage
compute_alcohol_indicators(design)
Arguments
design |
A survey design object from |
Value
A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
current_alcohol_total: current alcohol use, overall -
current_alcohol_by_sex: current alcohol use, by sex -
current_alcohol_by_age: current alcohol use, by age group -
heavy_episodic_total: heavy episodic drinking, overall -
heavy_episodic_by_sex: heavy episodic drinking, by sex -
heavy_episodic_by_age: heavy episodic drinking, by age group (if the corresponding variables exist in design)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
alcohol_results <- compute_alcohol_indicators(design)
Compute All STEPS Indicators
Description
Runs all indicator modules (tobacco, alcohol, diet & physical activity, anthropometry, blood pressure, and biochemical), using the appropriate step-specific survey design for each domain per WHO STEPS methodology:
Step 1 (behavioural): tobacco, alcohol, diet & physical activity
Step 2 (physical): anthropometry, blood pressure
Step 3 (biochemical): biochemical measures
Usage
compute_all_indicators(design)
Arguments
design |
A |
Value
A list with two elements:
-
results: a named list containing indicator results grouped by domain (tobacco, alcohol, diet_pa, anthropometry, blood_pressure, biochemical) -
key_indicators: a tibble with columns domain, indicator, estimate, lower, and upper, summarising headline estimates across all domains
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
all_indicators <- compute_all_indicators(design)
names(all_indicators$results)
Compute all tables from the registry
Description
Iterates through the full steps_table_registry() and computes every
table that has available data. Returns a named list of results.
Usage
compute_all_tables(designs, data = NULL)
Arguments
designs |
A list of survey designs, with elements |
data |
The cleaned data frame. |
Value
A named list of table results (from compute_table()).
Only entries with available == TRUE are included.
Compute Anthropometry Indicators
Description
Calculates prevalence of overweight, obesity, and central obesity, plus mean BMI and waist circumference, from a survey design object.
Usage
compute_anthropometry_indicators(design)
Arguments
design |
A survey design object from |
Value
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
overweight_obese_total: overweight or obese (BMI >=25), overall -
overweight_obese_by_sex: overweight or obese, by sex -
overweight_obese_by_age: overweight or obese, by age group -
obese_total: obese (BMI >=30), overall -
obese_by_sex: obese, by sex -
obese_by_age: obese, by age group -
central_obesity_total: central obesity, overall -
central_obesity_by_sex: central obesity, by sex -
central_obesity_by_age: central obesity, by age group -
bmi_mean_total: mean BMI, overall -
bmi_mean_by_sex: mean BMI, by sex -
waist_cm_mean_total: mean waist circumference, overall -
waist_cm_mean_by_sex: mean waist circumference, by sex (if the corresponding variables exist in design)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
anthropometry_results <- compute_anthropometry_indicators(design)
Compute Biochemical Indicators
Description
Calculates prevalence of raised glucose, diabetes, impaired glucose tolerance, and raised cholesterol, plus mean fasting glucose and total cholesterol from a survey design object.
Usage
compute_biochemical_indicators(design)
Arguments
design |
A survey design object from |
Value
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
raised_glucose_total: raised fasting glucose, overall -
raised_glucose_by_sex: raised fasting glucose, by sex -
raised_glucose_by_age: raised fasting glucose, by age group -
diabetes_total: diabetes, overall -
diabetes_by_sex: diabetes, by sex -
diabetes_by_age: diabetes, by age group -
impaired_glucose_total: impaired fasting glucose, overall -
impaired_glucose_by_sex: impaired fasting glucose, by sex -
impaired_glucose_by_age: impaired fasting glucose, by age group -
raised_chol_total: raised total cholesterol, overall -
raised_chol_by_sex: raised total cholesterol, by sex -
raised_chol_by_age: raised total cholesterol, by age group -
fasting_glucose_mean_total: mean fasting glucose, overall -
fasting_glucose_mean_by_sex: mean fasting glucose, by sex -
total_chol_mean_total: mean total cholesterol, overall -
total_chol_mean_by_sex: mean total cholesterol, by sex (if the corresponding variables exist in design)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
biochemical_results <- compute_biochemical_indicators(design)
Compute Blood Pressure Indicators
Description
Calculates prevalence of raised blood pressure and mean systolic and diastolic blood pressure from a survey design object.
Usage
compute_bp_indicators(design)
Arguments
design |
A survey design object from |
Value
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
raised_bp_total: raised blood pressure, overall -
raised_bp_by_sex: raised blood pressure, by sex -
raised_bp_by_age: raised blood pressure, by age group -
mean_sbp_mean_total: mean systolic BP, overall -
mean_sbp_mean_by_sex: mean systolic BP, by sex -
mean_sbp_mean_by_age: mean systolic BP, by age group -
mean_dbp_mean_total: mean diastolic BP, overall -
mean_dbp_mean_by_sex: mean diastolic BP, by sex -
mean_dbp_mean_by_age: mean diastolic BP, by age group (if the corresponding variables exist in design)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
bp_results <- compute_bp_indicators(design)
Compute Diet and Physical Activity Indicators
Description
Calculates prevalence of insufficient physical activity and low fruit & vegetable intake, plus mean metabolic equivalent (MET) values, from a survey design object.
Usage
compute_diet_pa_indicators(design)
Arguments
design |
A survey design object from |
Value
A named list of survey estimates. Each element contains estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
insufficient_pa_total: insufficient physical activity, overall -
insufficient_pa_by_sex: insufficient physical activity, by sex -
insufficient_pa_by_age: insufficient physical activity, by age group -
low_fruit_veg_total: low fruit & vegetable intake, overall -
low_fruit_veg_by_sex: low fruit & vegetable intake, by sex -
low_fruit_veg_by_age: low fruit & vegetable intake, by age group -
met_mean_total: mean MET (if available) -
met_mean_by_sex: mean MET by sex (if available) (if the corresponding variables exist in design)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
diet_pa_results <- compute_diet_pa_indicators(design)
Generic Compute Engine for WHO STEPS Tables
Description
Takes a table specification from steps_table_registry() and a survey
design object, and produces the survey-weighted estimates needed to fill
the standard WHO STEPS data book table.
Compute a single table from a registry entry
Description
This is the main workhorse: given one registry entry and a survey design,
it dispatches to the appropriate method based on entry$type and returns
a standardised result list.
Usage
compute_table(entry, design, data = NULL)
Arguments
entry |
A single list element from |
design |
A survey design object (from |
data |
The cleaned data frame (used for variable availability checks). |
Value
A list with:
- id
Table identifier.
- title
Table title.
- type
Table type.
- available
Logical: TRUE if the required variable(s) exist.
- results
A list of data frames: For proportion: total, by_sex, by_age (each with estimate, lower, upper). For mean: total, by_sex, by_age (each with estimate, lower, upper). For category: total, by_sex, by_age (each with level, estimate, lower, upper). For cascade: named list of proportion results.
Compute Tobacco Use Indicators
Description
Calculates prevalence of tobacco use from a survey design object. Computes proportions of current and daily tobacco use, stratified by sex and age group where available.
Usage
compute_tobacco_indicators(design)
Arguments
design |
A survey design object from |
Details
When both smoking and smokeless tobacco variables are present,
current_tobacco_any (either smoking or smokeless) is preferred
as the headline tobacco indicator. The function also reports
current_smoker and current_smokeless separately if available.
Value
A named list of survey estimates. Each element contains proportion estimates (as tibble with columns: estimate, lower, upper, etc.) for:
-
current_tobacco_any_total/by_sex/by_age: any current tobacco use (smoking or smokeless) – preferred headline variable -
current_tobacco_total/by_sex/by_age: current tobacco smoking -
current_smoker_total/by_sex/by_age: current smoker -
current_smokeless_total/by_sex/by_age: current smokeless tobacco -
daily_tobacco_total/by_sex/by_age: daily tobacco use (only elements for variables present in design are returned)
See Also
Examples
test_data <- generate_test_data(n = 500, seed = 42)
cols <- detect_steps_columns(test_data)
clean <- clean_steps_data(test_data, cols)
design <- setup_survey_design(clean)
tobacco_results <- compute_tobacco_indicators(design)
Detect a STEPS column by alias
Description
Tries to find a column in the data matching one of several candidate names (case-insensitive).
Usage
detect_col(data, candidates, label = NULL)
Arguments
data |
A data frame. |
candidates |
Character vector of possible column names. |
label |
Optional label for progress messages. |
Value
The matched column name (character) or NULL.
Auto-detect all standard STEPS columns
Description
Scans a data frame for standard WHO STEPS variable names across versions 3.1 and 3.2. Aliases are listed in priority order: the first match wins, so put the most specific / unambiguous name first.
Usage
detect_steps_columns(data)
Arguments
data |
A data frame (typically from |
Details
WHO STEPS reorganised variable codes between v3.1 and v3.2:
v3.1 / Epi Info codes (still common in many country datasets): B1-B6 = blood-pressure readings, B7 = BP meds, C1 = fasting glucose, C5 = DM meds, C6 = total cholesterol, C10 = chol meds, M1 = height, M2 = weight, M3 = waist.
v3.2 instrument codes: M4a/M5a/M6a = SBP readings, M4b/M5b/M6b = DBP readings, M7 = BP meds, M11 = height, M12 = weight, M14 = waist, M15 = hip, B5 = fasting glucose, B6 = DM meds, B8 = total cholesterol, B9 = chol meds, B16 = triglycerides, B17 = HDL cholesterol, C1 = sex, C3 = age.
The function includes aliases for both versions so datasets from either instrument version are detected automatically.
Value
A named list of detected column names (or NULL for missing).
Extract Key Indicator for Fact Sheet
Description
Internal helper function to extract a single key indicator from an indicator list and format it as a tibble row with domain and label.
Usage
extract_key(ind_list, domain, key, label)
Arguments
ind_list |
A list of indicator results (e.g., from compute_tobacco_indicators) |
domain |
The domain name (e.g., "Tobacco", "Alcohol") |
key |
The result name to extract (e.g., "current_tobacco_total") |
label |
The display label for the indicator |
Value
A tibble row with columns: domain, indicator, estimate, lower, upper, or NULL if the key does not exist in ind_list
Format estimate with confidence interval
Description
Format estimate with confidence interval
Usage
fmt_est(est, lower, upper, digits = 1, pct = TRUE)
Arguments
est |
Estimated value. |
lower |
Lower confidence bound. |
upper |
Upper confidence bound. |
digits |
Number of decimal places (default 1). |
pct |
Logical; add percent sign? (default TRUE). |
Value
Formatted string like "42.1% (39.2–45.0)".
Generate simulated STEPS test data
Description
Creates a realistic simulated dataset matching WHO STEPS survey structure. Includes sampling design variables, demographics, and measures from all three steps (behavioural, physical, biochemical).
Usage
generate_test_data(n = 3000, seed = 42)
Arguments
n |
Number of observations (default 3000). |
seed |
Random seed for reproducibility (default 42). |
Details
Simulation parameters are realistic for low-middle income settings:
Tobacco prevalence: 32% males, 8% females
Alcohol current use: 55% males, 28% females
Heavy episodic drinking: 35% of drinkers
Physical activity: MET-minutes/week, mean 1800, SD 1200
Diet: Fruit/veg days and servings per day (0-7, 1-5)
BP increases with age; medication prevalence 12%
Glucose: mean 5.2 mmol/L, increases with age
Total cholesterol: mean 4.8 mmol/L
Use this function for:
Testing the STEPS pipeline
Developing reports before real data arrives
Training analysts on the analysis system
Value
A data frame with n rows and the following columns:
-
stratum: Strata identifier (S1-S5) -
psu: Primary sampling unit (PSU1-PSU40) -
wt_final: Final analysis weight -
sex: Sex (1=Male, 2=Female) -
age: Age in years (18-69) Step 1 (behavioural):
t1,t2(tobacco),a1,a5(alcohol),met_total(physical activity),d1-d4(diet)Step 2 (physical):
m1(height),m2(weight),m3(waist),b1-b6(blood pressure),b7(BP medication)Step 3 (biochemical):
c1_mmol(glucose),c5(DM meds),c6(cholesterol),c10(cholesterol meds)
Examples
# Generate smaller dataset for quick testing
test_data <- generate_test_data(n = 500, seed = 123)
head(test_data)
Get table registry entries by section
Description
Get table registry entries by section
Usage
get_registry_by_section(section = NULL)
Arguments
section |
Section name (e.g., "Tobacco Use", "Blood Pressure"). If NULL, returns all entries. |
Value
A filtered list of registry entries.
Get table registry entries by step
Description
Get table registry entries by step
Usage
get_registry_by_step(step)
Arguments
step |
STEPS step number (1, 2, or 3). |
Value
A filtered list of registry entries.
Import raw STEPS survey data
Description
Reads a raw STEPS data file (CSV, Excel, Stata, or SPSS) and standardises column names to lowercase with underscores.
Usage
import_steps_data(path)
Arguments
path |
Character. Path to the data file. |
Value
A data frame with cleaned column names.
Examples
## Not run:
raw <- import_steps_data("data/raw/steps_data.csv")
## End(Not run)
List all available sections in the registry
Description
List all available sections in the registry
Usage
list_registry_sections()
Value
Character vector of unique section names.
Create age-stratified trend chart
Description
Create age-stratified trend chart
Usage
make_age_chart(by_age_df, title, steps_colors, theme_steps)
Arguments
by_age_df |
Data frame with |
title |
Plot title. |
steps_colors |
Colour list from |
theme_steps |
Theme from |
Value
A ggplot2 object or NULL if input is NULL.
Create sex-stratified bar chart
Description
Create sex-stratified bar chart
Usage
make_sex_chart(by_sex_df, title, var_name, steps_colors, theme_steps)
Arguments
by_sex_df |
Data frame with |
title |
Plot title. |
var_name |
Variable name (for internal use). |
steps_colors |
Colour list from |
theme_steps |
Theme from |
Value
A ggplot2 object or NULL if input is NULL.
Build a single STEPS table
Description
Constructs a flextable from age-stratified and total estimates.
Usage
make_steps_table(total_df, by_sex_df, by_age_df, label, pct = TRUE)
Arguments
total_df |
Data frame with total estimate, lower, upper. |
by_sex_df |
Data frame with sex-stratified estimates (not used, kept for compatibility). |
by_age_df |
Data frame with age-stratified estimates and |
label |
Table caption. |
pct |
Logical; format as percent? (default TRUE). |
Value
A flextable object.
About Page Module Server
Description
About Page Module Server
Usage
mod_about_server(id)
Arguments
id |
Module namespace ID. |
About Page Module UI
Description
Displays information about the stepssurvey package, authors, and institutional affiliations.
Usage
mod_about_ui(id)
Arguments
id |
Module namespace ID. |
Value
A Shiny UI element.
Clean Module — Server
Description
Clean Module — Server
Usage
mod_clean_server(id, upload_out)
Arguments
id |
Module namespace id. |
upload_out |
Reactive list returned by |
Value
A shiny::reactive returning the cleaned data frame.
Clean Module — UI
Description
Clean Module — UI
Usage
mod_clean_ui(id)
Arguments
id |
Module namespace id. |
Design Module — Server
Description
Design Module — Server
Usage
mod_design_server(id, clean_out)
Arguments
id |
Module namespace id. |
clean_out |
Reactive returning the cleaned data frame. |
Value
A shiny::reactive returning the survey design object.
Design Module — UI
Description
Design Module — UI
Usage
mod_design_ui(id)
Arguments
id |
Module namespace id. |
Indicators Module — Server
Description
Indicators Module — Server
Usage
mod_indicators_server(id, design_out)
Arguments
id |
Module namespace id. |
design_out |
Reactive returning a survey::svydesign object. |
Value
A list with two reactive elements:
-
results: reactive list of indicator results by domain -
key_indicators: reactive tibble of headline key indicators
Indicators Module — UI
Description
Indicators Module — UI
Usage
mod_indicators_ui(id)
Arguments
id |
Module namespace id. |
Value
UI definition for indicators computation and display.
Data Quality Module — Server
Description
Data Quality Module — Server
Usage
mod_quality_server(id, upload_out, clean_out)
Arguments
id |
Module namespace id. |
upload_out |
Reactive list returned by |
clean_out |
Reactive returning the cleaned data frame from
|
Value
A reactive returning the steps_quality object (or NULL).
Data Quality Module — UI
Description
Data Quality Module — UI
Usage
mod_quality_ui(id)
Arguments
id |
Module namespace id. |
Reports Module — Server
Description
Reports Module — Server
Usage
mod_reports_server(id, upload_out)
Arguments
id |
Module namespace id. |
upload_out |
Reactive list returned by |
Details
The full pipeline is executed internally:
Cleans data using
clean_steps_data()Sets up survey design with
setup_survey_design()Computes all indicators with
compute_all_indicators()Builds tables with
build_steps_tables()Builds plots with
build_steps_plots()Saves all outputs to temp directory
Renders report using
render_country_report()
Progress is shown via status text and progress bar during generation. Errors are caught and displayed to the user via notifications.
Value
A list with reactive elements:
-
report_path: Path to generated Word document -
output_dir: Directory containing all outputs -
generation_status: Current status string
Reports Module — UI
Description
Reports Module — UI
Usage
mod_reports_ui(id)
Arguments
id |
Module namespace id. |
Upload Module — Server
Description
Upload Module — Server
Usage
mod_upload_server(id)
Arguments
id |
Module namespace id. |
Value
A shiny::reactiveValues with elements raw, cols, config.
Upload Module — UI
Description
Upload Module — UI
Usage
mod_upload_ui(id)
Arguments
id |
Module namespace id. |
Visualise Module — Server
Description
Server logic for the visualisation module. Generates plots on button click, renders them in tabs, and provides ZIP download functionality.
Usage
mod_visualise_server(id, results_out, upload_out)
Arguments
id |
Module namespace id. |
results_out |
Reactive list returned by |
upload_out |
Reactive list returned by |
Details
The module calls build_steps_plots() with the reactive inputs to generate
a named list of ggplot2 objects. These are then rendered in individual tabs.
The download handler creates a ZIP archive of all plots as PNG files.
Value
NULL (invisible). This is a terminal module for visualisation.
Visualise Module — UI
Description
Creates the user interface for the visualisation module, including a "Generate plots" action button and tabbed layout for displaying STEPS survey visualizations.
Usage
mod_visualise_ui(id)
Arguments
id |
Module namespace id. |
Value
A Shiny UI definition.
Plot completeness heatmap across STEPS domains
Description
Creates a tile heatmap showing missingness percentage by variable, grouped by STEPS domain.
Usage
plot_completeness(dq)
Arguments
dq |
A |
Value
A ggplot object.
Plot digit preference histogram for a physical measurement
Description
Creates a bar chart of terminal-digit frequencies with the expected uniform line at 10 %.
Usage
plot_digit_preference(dq, measure)
Arguments
dq |
A |
measure |
Character: one of "SBP", "DBP", "Height", "Weight", "Waist". |
Value
A ggplot object.
Plot sampling weight distribution
Description
Creates a histogram of sampling weights with summary statistics.
Usage
plot_weights(dq, step = "weight_step1")
Arguments
dq |
A |
step |
Character: which weight to plot ("weight_step1", "weight_step2", or "weight_step3"). Defaults to "weight_step1". |
Value
A ggplot object.
Read a column mapping file
Description
Reads a filled-in column mapping template (Excel or CSV) and returns a
named list suitable for passing to clean_steps_data(). The mapping file
should have at least two columns: one with the standard variable name
(column A) and one with the user's column name (column C in the template,
or the third column).
Usage
read_column_mapping(path, data = NULL)
Arguments
path |
Path to the filled mapping file (.xlsx or .csv). |
data |
Optional data frame. If provided, the function validates that every mapped column actually exists in the data. |
Details
This function is the manual alternative to detect_steps_columns().
Use it when your dataset has non-standard variable names that
auto-detection cannot resolve.
A blank template can be obtained from
system.file("templates", "column_mapping_template.xlsx",
package = "stepssurvey")
or downloaded from the Shiny app.
The function ignores domain-header rows (rows where column A is all-caps with no entry in column C) and skips any row where the user's column name is blank.
Value
A named list where names are standard variable identifiers
(e.g. "age", "sbp1") and values are the corresponding
column names in the user's dataset.
Unmapped variables are set to NULL.
Examples
## Not run:
cols <- read_column_mapping("my_mapping.xlsx")
raw <- import_steps_data("survey.dta")
clean <- clean_steps_data(raw, cols)
## End(Not run)
Recode a yes/no variable to logical
Description
Converts various representations of yes/no (numeric, text, case-insensitive) to logical TRUE/FALSE values.
Usage
recode_yn(x)
Arguments
x |
A vector of yes/no values (numeric or character). |
Value
A logical vector, with NA for unrecognized values.
Examples
recode_yn(c(1, 2, "yes", "no", NA))
Render STEPS Country Report
Description
Generates a comprehensive Word document with executive summary, indicator-by-indicator analysis, and recommendations for public health action.
Usage
render_country_report(config, output_dir = tempdir())
Arguments
config |
A list from |
output_dir |
Directory for output reports (default |
Details
Sections include:
Executive summary with key findings
Tobacco use
Physical activity
Overweight and obesity
Blood pressure
Blood glucose and cholesterol
Recommendations for public health action
Methodology
Requires pre-computed indicators, tables, and plots in data/processed/.
Value
Path to generated Word document (invisibly). Prints message with output location.
Render STEPS Data Book report
Description
Generates a Word document with detailed age-stratified prevalence tables for all available indicators, organized by STEPS step.
Usage
render_data_book(config, output_dir = tempdir())
Arguments
config |
A list from |
output_dir |
Directory for output reports (default |
Details
Sections correspond to STEPS steps:
Step 1: Behavioural Risk Factors (tobacco, alcohol, diet, physical activity)
Step 2: Physical Measurements (overweight/obesity, blood pressure)
Step 3: Biochemical (glucose, cholesterol)
Requires pre-computed tables and plots in data/processed/.
Value
Path to generated Word document (invisibly). Prints message with output location.
Render STEPS Fact Sheet report
Description
Generates a Word document with an overview of key NCD risk factor prevalence, including summary table and sex-stratified charts.
Usage
render_fact_sheet(config, output_dir = tempdir(), format = c("html", "word"))
Arguments
config |
A list from |
output_dir |
Directory for output reports (default |
format |
Output format: |
Details
The fact sheet template uses pre-computed indicators, key_indicators, and plots (via .rds files in data/processed/). Requires rmarkdown, flextable, ggplot2, glue, patchwork packages.
Value
Path to generated output file (invisibly). Prints message with output location.
Launch the stepssurvey Shiny Application
Description
Starts the interactive STEPS survey analysis app in the user's browser. The app provides a guided workflow: upload data, clean, set survey design, compute indicators, visualise results, and generate Word reports.
Usage
run_app(...)
Arguments
... |
Additional arguments passed to |
Value
A Shiny app object (invisibly). Called for its side effect of launching the application.
Examples
## Not run:
run_app()
## End(Not run)
Run the complete STEPS analysis pipeline
Description
Imports raw data, cleans it, sets up the survey design, computes all indicators, generates publication-ready tables and plots, and optionally renders Word reports.
Usage
run_steps_pipeline(
data_path,
country_name = "Country Name",
survey_year = 2024,
age_min = 18,
age_max = 69,
output_dir = tempdir(),
render_reports = TRUE,
mapping_file = NULL
)
Arguments
data_path |
Path to raw STEPS data file (CSV, Excel, Stata, or SPSS). |
country_name |
Country name for reports (default "Country Name"). |
survey_year |
Survey year (default 2024). |
age_min |
Minimum age in years (default 18). |
age_max |
Maximum age in years (default 69). |
output_dir |
Directory for all outputs (default |
render_reports |
Logical; render Word documents? (default TRUE). |
mapping_file |
Optional path to a filled column mapping template
(Excel or CSV). If provided, uses |
Details
This is the main entry point for end-to-end STEPS analysis.
Value
A list with elements:
- raw_data
Original imported data frame
- clean_data
Cleaned and recoded data
- cols
Detected column mapping from
detect_steps_columns()- design
survey::svydesign object
- indicators
List of all computed indicator results by domain
- key_indicators
Summary tibble of headline estimates
- tables
List of flextable::flextable objects
- plots
List of ggplot2::ggplot objects
- config
Configuration list from
steps_config()
Examples
## Not run:
# Auto-detect columns
result <- run_steps_pipeline("data/raw/steps_data.csv",
country_name = "Senegal",
survey_year = 2023)
result$key_indicators
result$plots$overview
# Use a custom column mapping
result <- run_steps_pipeline("data/raw/steps_data.csv",
country_name = "Senegal",
survey_year = 2023,
mapping_file = "my_mapping.xlsx")
## End(Not run)
Save STEPS plots to PNG files
Description
Exports all plots in a list to PNG files in the specified directory.
Usage
save_steps_plots(plots, output_dir = tempdir())
Arguments
plots |
A named list of ggplot2 objects (from |
output_dir |
Output directory path (default |
Details
Files are named:
-
01_overview_indicators.png(12x8 in) -
02_by_sex_dashboard.png(12x8 in) -
03_bp_by_age.png(10x6 in) -
04_obesity_by_age.png(10x6 in)
All saved at 150 dpi with white background.
Value
NULL (invisibly). Prints messages about saved files.
Set up survey designs for STEPS data (one per Step)
Description
Creates up to three survey design objects — one per WHO STEPS Step —
each using the appropriate step-specific weight column
(wt_step1, wt_step2, wt_step3).
Usage
setup_survey_design(data)
Arguments
data |
A data frame (typically from |
Details
The returned object is a list of class "steps_designs" with elements
$step1, $step2, $step3. For backward compatibility it can also
be used directly as a single design (it delegates to $step1).
The function handles five design cases per step:
Full complex design: weights + strata + clusters
Weights + clusters, no strata
Weights + strata, no clusters
Weights only
Unweighted (simple random sampling)
Weights are used as-is without trimming, consistent with the WHO official STEPS analysis scripts.
Value
A list of class "steps_designs" with three
survey::svydesign objects (step1, step2, step3).
WHO STEPS colour palette
Description
A named list of colours used in WHO STEPS reports and visualisations.
Usage
steps_colors()
Value
A named list of hex colour codes.
Examples
steps_colors()$blue
Create STEPS analysis configuration
Description
Builds a configuration list that specifies data paths, design variables, and report parameters for the STEPS pipeline.
Usage
steps_config(
data_path,
country_name = "Country Name",
survey_year = 2024,
age_min = 18,
age_max = 69,
weight_var = "wt_final",
strata_var = "stratum",
cluster_var = "psu",
bp_sbp_threshold = 140,
bp_dbp_threshold = 90,
bmi_overweight = 25,
bmi_obese = 30,
glucose_threshold = 7,
glucose_impaired_threshold = 6.1,
chol_threshold = 5
)
Arguments
data_path |
Path to raw STEPS data file (CSV or Excel). |
country_name |
Country name for reports (default "Country Name"). |
survey_year |
Survey year (default 2024). |
age_min |
Minimum age (default 18). |
age_max |
Maximum age (default 69). |
weight_var |
Weight variable name (default "wt_final", set NULL if none). |
strata_var |
Strata variable name (default "stratum", set NULL if none). |
cluster_var |
Cluster variable name (default "psu", set NULL if none). |
bp_sbp_threshold |
SBP threshold for raised BP (default 140). |
bp_dbp_threshold |
DBP threshold for raised BP (default 90). |
bmi_overweight |
BMI threshold for overweight (default 25.0). |
bmi_obese |
BMI threshold for obesity (default 30.0). |
glucose_threshold |
Fasting glucose threshold in mmol/L (default 7.0). |
glucose_impaired_threshold |
Impaired fasting glucose threshold in mmol/L (default 6.1). |
chol_threshold |
Total cholesterol threshold in mmol/L (default 5.0). |
Value
A list with elements:
-
data_path: Input file path -
country_name: Country name -
survey_year: Survey year -
age_min,age_max: Age range -
weight_var,strata_var,cluster_var: Design variable names Threshold parameters for BP, BMI, glucose, cholesterol
Examples
## Not run:
cfg <- steps_config("data/steps_2023.csv", "Senegal", 2023)
cfg <- steps_config("data/steps.csv", "Mongolia", 2019,
bp_sbp_threshold = 130, bp_dbp_threshold = 80)
## End(Not run)
Data Quality Diagnostics for WHO STEPS Data
Description
Produces a comprehensive data quality report covering digit preference, completeness, plausibility, and sampling weight diagnostics.
Usage
steps_data_quality(raw, cleaned, cols)
Arguments
raw |
The raw (pre-cleaning) data frame, typically from
|
cleaned |
The cleaned data frame from |
cols |
Column mapping list from |
Details
Digit preference / heaping is assessed using the Whipple-style heaping index: the ratio of observed frequency at a digit (0 or 5) to the expected frequency under uniform distribution. An index of 1.0 = no preference; >1.5 = moderate heaping; >2.0 = severe.
Completeness reports missing values for key STEPS variables grouped by Step (behavioural, physical, biochemical).
Plausibility counts values outside WHO-recommended ranges (e.g. height 100–250 cm, weight 20–300 kg, SBP 60–300 mmHg).
Weight diagnostics summarise the distribution of sampling weights and flag potential issues (high CV, zero/NA weights).
Value
A list of class "steps_quality" with elements:
- digit_preference
Terminal-digit tables and heaping indices for physical measurements (SBP, DBP, height, weight, waist).
- completeness
Per-variable missingness counts and percentages, grouped by STEPS domain.
- plausibility
Summary of values outside plausible ranges.
- weights
Sampling weight distribution statistics.
WHO STEPS Data Book Table Registry
Description
Defines all standard tables from the WHO STEPS Epi Info report template. Each entry specifies the table metadata; generic compute and formatting functions use this registry to produce the full data book automatically.
Usage
steps_table_registry()
Value
A list of table specification lists.
Table types
- proportion
Single binary indicator: % (95% CI) by age × sex. Most common type. Example: "Current smokers among all respondents."
- mean
Continuous variable: mean (95% CI) by age × sex. Example: "Mean BMI (kg/m²)."
- category
Multi-level factor: % per level (95% CI) by age × sex. Example: "BMI classifications (Underweight / Normal / Overweight / Obese)."
- cascade
Diagnosis → treatment → control chain: multiple proportions with nested denominators. Example: "Raised BP diagnosis, treatment and control."
- combined
Summary of combined risk factors: 0, 1-2, 3-5 risk factors.
Registry fields
- id
Unique short identifier (e.g., "T_smoking_current").
- section
Data book section (e.g., "Tobacco Use", "Blood Pressure").
- step
STEPS step number (1, 2, or 3).
- title
Table title as shown in the data book.
- description
One-line description from the WHO template.
- type
One of: "proportion", "mean", "category", "cascade", "combined".
- variable
Column name(s) in the cleaned data frame to analyse. For proportion: single logical variable. For mean: single numeric variable. For category: single factor variable. For cascade: named list of logical variables.
- denominator
NULL (= all respondents) or column name for subsetting (e.g., "current_alcohol" to restrict to drinkers).
- levels
For category type: named character vector of level labels.
- epi_info
Epi Info program name(s) for reference.
- unit
Display unit (e.g., "%", "mmHg", "cm", "kg/m²", "mmol/L").
- questions
STEPS instrument question codes used.
- sex_panels
Logical. TRUE = 3 panels (Men/Women/Both); FALSE = 2 panels (Men/Women only, e.g., height/weight means). Default TRUE.
Weighted mean estimation with 95% CI
Description
Calculates weighted means with 95% confidence intervals for a continuous variable, optionally stratified by a grouping variable.
Usage
svymn(formula, design, by = NULL, na.rm = TRUE)
Arguments
formula |
A formula (e.g., |
design |
A survey design object (from |
by |
Optional formula for stratification (e.g., |
na.rm |
Logical; if TRUE (default), omit NA values. |
Value
A data frame with columns:
-
estimate: estimated mean -
lower: 95% CI lower bound -
upper: 95% CI upper bound -
se: standard error If
byis specified: grouping column(s) prepended
Weighted proportion estimation with 95% CI
Description
Calculates weighted proportions (as percentages) with 95% confidence intervals for a yes/no variable, optionally stratified by a grouping variable.
Usage
svyprop(formula, design, by = NULL, na.rm = TRUE)
Arguments
formula |
A formula (e.g., |
design |
A survey design object (from |
by |
Optional formula for stratification (e.g., |
na.rm |
Logical; if TRUE (default), omit NA values. |
Value
A data frame with columns:
-
estimate: estimated proportion (%) -
lower: 95% CI lower bound (%) -
upper: 95% CI upper bound (%) -
se: standard error (%) If
byis specified: grouping column(s) prepended
Generic Table Builder for WHO STEPS Data Book
Description
Takes computed results from compute_table() or compute_all_tables()
and produces formatted flextable objects in the standard WHO STEPS
3-panel format (Men / Women / Both Sexes).
Format indicator results tibble for display
Description
Internal helper to convert a domain's indicator results list into a formatted data frame for DT table display.
Usage
tbl_domain(domain_name, domain_results)
Arguments
domain_name |
Character name of the domain for display. |
domain_results |
List of indicator results for the domain. |
Value
A formatted data frame ready for DT::datatable, or NULL if empty.
WHO STEPS ggplot2 theme
Description
A clean, minimal ggplot2 theme styled with WHO STEPS colours.
Usage
theme_steps(base_size = 11)
Arguments
base_size |
Base font size (default 11). |
Value
A ggplot2::theme object.
Examples
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + theme_steps()