library(shellgame)
library(geoDeltaAudit)
library(dplyr)
library(stringr)
library(janitor)
# vignette-only dependency; keep in Suggests
if (!requireNamespace("readr", quietly = TRUE)) {
stop("Package 'readr' is required to run this vignette. Install it with install.packages('readr').")
}This vignette demonstrates a complete transformation audit using Hennepin County, Minnesota as an example. We’ll track total population through the transformation chain:
ZCTA → ZIP → COUNTY
And reveal the shell game: same column name (“population”), different underlying quantity (observed → imputed).
For this example, we’ll use the data you would typically prepare:
acs_path <- system.file("extdata", "toy_acs_zcta_hennepin.csv", package = "geoDeltaAudit")
hud_path <- system.file("extdata", "toy_zip_county_hud_hennepin.csv", package = "geoDeltaAudit")
stopifnot(nchar(acs_path) > 0, nchar(hud_path) > 0)
acs <- readr::read_csv(acs_path, show_col_types = FALSE) |>
janitor::clean_names() |>
dplyr::mutate(zcta = stringr::str_pad(as.character(.data$zcta), 5, pad = "0"))
hud <- readr::read_csv(hud_path, show_col_types = FALSE) |>
janitor::clean_names()
# Toy assoc: 1:1 ZCTA -> ZIP so the example always runs
assoc <- acs |>
dplyr::distinct(.data$zcta) |>
dplyr::transmute(zcta = .data$zcta, zip = .data$zcta) |>
dplyr::distinct()
list(
acs_rows = nrow(acs),
assoc_rows = nrow(assoc),
hud_rows = nrow(hud)
)Note: The following graphics are pre-rendered from the configured Hennepin County example dataset to illustrate the spatial relationships being audited.
=== The Shell Game: Transformation Audit ===
Variable: population
Target County: 27053
--- Baseline (Observed Data) ---
Units: 74 ZCTAs
Total: 1,391,557
--- After Transformation (Imputed Data) ---
Intermediate: 98 ZIPs
Recovered: 1,216,874
--- The Shell Game Result ---
Perturbation: -174,683 (-12.6%)
Same column name.
Different underlying quantity.
That's the shell game.
--- Pre-Allocation Expansion ---
74 ZCTAs → 98 ZIPs (+32.4%)
This happens BEFORE any allocation or weighting.
The analytical surface has already shifted.
--- Top Counties Receiving Perturbed Population ---
27003: 30,535
27139: 25,268
27123: 21,835
27171: 14,391
27059: 9,526
The analysis begins with 74 ZCTAs that have a relationship-based membership with Hennepin County. These are the ZCTAs used by the Census Bureau in ACS tabulations.
Total population: 1,391,557 (directly observed from ACS)
## The First
Hop: ZCTA → ZIP
When we associate these 74 ZCTAs with ZIP codes:
Result: 74 ZCTAs become 98 ZIPs (+32.4%)
This happens before any allocation. The analytical surface has already shifted.
Using HUD’s TOT_RATIO, we allocate ZIP-level population to counties.
Result: Population recovered for Hennepin County: 1,216,874
If we used geometric intersection instead of relationship-based membership, we would have 94 ZCTAs, not 74.
This is Decision #1: How do we define membership?
The 20 extra ZCTAs (shown in grey) intersect the county boundary geometrically but are not included in the relationship-based membership used by ACS. # Visualizing the Difference
The baseline: 74 ZCTAs with relationship-based membership.
The difference: Grey areas show ZCTAs that appear only under geometric intersection.
# Normalize expected fields from geoDeltaAudit::audit_transform()
baseline_total <- as.numeric(audit_result$baseline_total)
final_total <- as.numeric(audit_result$final_total)
# delta is already provided; compute if missing
delta <- if (!is.null(audit_result$delta)) {
as.numeric(audit_result$delta)
} else {
final_total - baseline_total
}
absolute_perturbation <- abs(delta)Same column name: “population”
Different underlying quantity: observed → imputed
That’s the shell game.
This error is agnostic to:
** Transformation is the cause, not the tool or variable.**
See vignette("data-preparation") for how to prepare your
own data. See vignette("conceptual_framework-shell-game")
for the conceptual explanation.