The standardized (mean) difference is a measure of distance between two group means in terms of one or more variables. In practice it is often used as a balance measure of individual covariates before and after propensity score matching. As it is standardized, comparison across variables on different scales is possible. For definitions see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title .
Standardized mean differences can be easily calculated with tableone. All standardized mean differences in this package are absolute values, thus, there is no directionality.
## tableone package itself
library(tableone)
## PS matching
library(Matching)
## Weighted analysis
library(survey)
## Reorganizing data
library(reshape2)
## plotting
library(ggplot2)
The right heart catheterization dataset is available at http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets . This dataset was originally used in Connors et al. JAMA 1996;276:889-897, and has been made publicly available.
## Right heart cath dataset
rhc <- read.csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/rhc.csv")
Out of the 50 covariates, 32 have standardized mean differences of greater than 0.1, which is often considered the sign of important covariate imbalance (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title ).
## Covariates
vars <- c("age","sex","race","edu","income","ninsclas","cat1","das2d3pc","dnr1",
"ca","surv2md1","aps1","scoma1","wtkilo1","temp1","meanbp1","resp1",
"hrt1","pafi1","paco21","ph1","wblc1","hema1","sod1","pot1","crea1",
"bili1","alb1","resp","card","neuro","gastr","renal","meta","hema",
"seps","trauma","ortho","cardiohx","chfhx","dementhx","psychhx",
"chrpulhx","renalhx","liverhx","gibledhx","malighx","immunhx",
"transhx","amihx")
## Construct a table
tabUnmatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhc, test = FALSE)
## Show table with SMD
print(tabUnmatched, smd = TRUE)
Stratified by swang1
No RHC RHC SMD
n 3551 2184
age (mean (SD)) 61.76 (17.29) 60.75 (15.63) 0.061
sex = Male (%) 1914 (53.9) 1278 (58.5) 0.093
race (%) 0.036
black 585 (16.5) 335 (15.3)
other 213 ( 6.0) 142 ( 6.5)
white 2753 (77.5) 1707 (78.2)
edu (mean (SD)) 11.57 (3.13) 11.86 (3.16) 0.091
income (%) 0.142
$11-$25k 713 (20.1) 452 (20.7)
$25-$50k 500 (14.1) 393 (18.0)
> $50k 257 ( 7.2) 194 ( 8.9)
Under $11k 2081 (58.6) 1145 (52.4)
ninsclas (%) 0.194
Medicaid 454 (12.8) 193 ( 8.8)
Medicare 947 (26.7) 511 (23.4)
Medicare & Medicaid 251 ( 7.1) 123 ( 5.6)
No insurance 186 ( 5.2) 136 ( 6.2)
Private 967 (27.2) 731 (33.5)
Private & Medicare 746 (21.0) 490 (22.4)
cat1 (%) 0.583
ARF 1581 (44.5) 909 (41.6)
CHF 247 ( 7.0) 209 ( 9.6)
COPD 399 (11.2) 58 ( 2.7)
Cirrhosis 175 ( 4.9) 49 ( 2.2)
Colon Cancer 6 ( 0.2) 1 ( 0.0)
Coma 341 ( 9.6) 95 ( 4.3)
Lung Cancer 34 ( 1.0) 5 ( 0.2)
MOSF w/Malignancy 241 ( 6.8) 158 ( 7.2)
MOSF w/Sepsis 527 (14.8) 700 (32.1)
das2d3pc (mean (SD)) 20.37 (5.48) 20.70 (5.03) 0.063
dnr1 = Yes (%) 499 (14.1) 155 ( 7.1) 0.228
ca (%) 0.107
Metastatic 261 ( 7.4) 123 ( 5.6)
No 2652 (74.7) 1727 (79.1)
Yes 638 (18.0) 334 (15.3)
surv2md1 (mean (SD)) 0.61 (0.19) 0.57 (0.20) 0.198
aps1 (mean (SD)) 50.93 (18.81) 60.74 (20.27) 0.501
scoma1 (mean (SD)) 22.25 (31.37) 18.97 (28.26) 0.110
wtkilo1 (mean (SD)) 65.04 (29.50) 72.36 (27.73) 0.256
temp1 (mean (SD)) 37.63 (1.74) 37.59 (1.83) 0.021
meanbp1 (mean (SD)) 84.87 (38.87) 68.20 (34.24) 0.455
resp1 (mean (SD)) 28.98 (13.95) 26.65 (14.17) 0.165
hrt1 (mean (SD)) 112.87 (40.94) 118.93 (41.47) 0.147
pafi1 (mean (SD)) 240.63 (116.66) 192.43 (105.54) 0.433
paco21 (mean (SD)) 39.95 (14.24) 36.79 (10.97) 0.249
ph1 (mean (SD)) 7.39 (0.11) 7.38 (0.11) 0.120
wblc1 (mean (SD)) 15.26 (11.41) 16.27 (12.55) 0.084
hema1 (mean (SD)) 32.70 (8.79) 30.51 (7.42) 0.269
sod1 (mean (SD)) 137.04 (7.68) 136.33 (7.60) 0.092
pot1 (mean (SD)) 4.08 (1.04) 4.05 (1.01) 0.027
crea1 (mean (SD)) 1.92 (2.03) 2.47 (2.05) 0.270
bili1 (mean (SD)) 2.00 (4.43) 2.71 (5.33) 0.145
alb1 (mean (SD)) 3.16 (0.67) 2.98 (0.93) 0.230
resp = Yes (%) 1481 (41.7) 632 (28.9) 0.270
card = Yes (%) 1007 (28.4) 924 (42.3) 0.295
neuro = Yes (%) 575 (16.2) 118 ( 5.4) 0.353
gastr = Yes (%) 522 (14.7) 420 (19.2) 0.121
renal = Yes (%) 147 ( 4.1) 148 ( 6.8) 0.116
meta = Yes (%) 172 ( 4.8) 93 ( 4.3) 0.028
hema = Yes (%) 239 ( 6.7) 115 ( 5.3) 0.062
seps = Yes (%) 515 (14.5) 516 (23.6) 0.234
trauma = Yes (%) 18 ( 0.5) 34 ( 1.6) 0.104
ortho = Yes (%) 3 ( 0.1) 4 ( 0.2) 0.027
cardiohx (mean (SD)) 0.16 (0.37) 0.20 (0.40) 0.116
chfhx (mean (SD)) 0.17 (0.37) 0.19 (0.40) 0.069
dementhx (mean (SD)) 0.12 (0.32) 0.07 (0.25) 0.163
psychhx (mean (SD)) 0.08 (0.27) 0.05 (0.21) 0.143
chrpulhx (mean (SD)) 0.22 (0.41) 0.14 (0.35) 0.192
renalhx (mean (SD)) 0.04 (0.20) 0.05 (0.21) 0.032
liverhx (mean (SD)) 0.07 (0.26) 0.06 (0.24) 0.049
gibledhx (mean (SD)) 0.04 (0.19) 0.02 (0.16) 0.070
malighx (mean (SD)) 0.25 (0.43) 0.20 (0.40) 0.101
immunhx (mean (SD)) 0.26 (0.44) 0.29 (0.45) 0.080
transhx (mean (SD)) 0.09 (0.29) 0.15 (0.36) 0.170
amihx (mean (SD)) 0.03 (0.17) 0.04 (0.20) 0.074
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabUnmatched) > 0.1))
FALSE TRUE Sum
18 32 50
Usually a logistic regression model is used to estimate individual propensity scores. The model here is taken from “How To Use Propensity Score Analysis” (http://www.mc.vanderbilt.edu/crc/workshop_files/2008-04-11.pdf ). Predicted probabilities of being assigned to right heart catherterization, being assigned no right heart catherterization, being assigned to the true assignment, as well as the smaller of the probabilities of being assigned to right heart catherterization or no right heart catherterization are calculated for later use in propensity score matching and weighting.
## Fit model
psModel <- glm(formula = swang1 ~ age + sex + race + edu + income + ninsclas +
cat1 + das2d3pc + dnr1 + ca + surv2md1 + aps1 + scoma1 +
wtkilo1 + temp1 + meanbp1 + resp1 + hrt1 + pafi1 +
paco21 + ph1 + wblc1 + hema1 + sod1 + pot1 + crea1 +
bili1 + alb1 + resp + card + neuro + gastr + renal +
meta + hema + seps + trauma + ortho + cardiohx + chfhx +
dementhx + psychhx + chrpulhx + renalhx + liverhx + gibledhx +
malighx + immunhx + transhx + amihx,
family = binomial(link = "logit"),
data = rhc)
Error in eval(family$initialize): y values must be 0 <= y <= 1
## Predicted probability of being assigned to RHC
rhc$pRhc <- predict(psModel, type = "response")
Error in predict(psModel, type = "response"): object 'psModel' not found
## Predicted probability of being assigned to no RHC
rhc$pNoRhc <- 1 - rhc$pRhc
Error in `$<-.data.frame`(`*tmp*`, pNoRhc, value = numeric(0)): replacement has 0 rows, data has 5735
## Predicted probability of being assigned to the
## treatment actually assigned (either RHC or no RHC)
rhc$pAssign <- NA
rhc$pAssign[rhc$swang1 == "RHC"] <- rhc$pRhc[rhc$swang1 == "RHC"]
Error in rhc$pAssign[rhc$swang1 == "RHC"] <- rhc$pRhc[rhc$swang1 == "RHC"]: replacement has length zero
rhc$pAssign[rhc$swang1 == "No RHC"] <- rhc$pNoRhc[rhc$swang1 == "No RHC"]
Error in rhc$pAssign[rhc$swang1 == "No RHC"] <- rhc$pNoRhc[rhc$swang1 == : replacement has length zero
## Smaller of pRhc vs pNoRhc for matching weight
rhc$pMin <- pmin(rhc$pRhc, rhc$pNoRhc)
Error in `$<-.data.frame`(`*tmp*`, pMin, value = integer(0)): replacement has 0 rows, data has 5735
The Matching package can be used for propensity score matching. The logit of propensity score is often used as the matching scale, and the matchign caliper is often 0.2 \(\times\) SD(logit(PS)). See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s5title for suggestions. After matching, all the standardized mean differences are below 0.1.
listMatch <- Match(Tr = (rhc$swang1 == "RHC"), # Need to be in 0,1
## logit of PS,i.e., log(PS/(1-PS)) as matching scale
X = log(rhc$pRhc / rhc$pNoRhc),
## 1:1 matching
M = 1,
## caliper = 0.2 * SD(logit(PS))
caliper = 0.2,
replace = FALSE,
ties = TRUE,
version = "fast")
Error in Match(Tr = (rhc$swang1 == "RHC"), X = log(rhc$pRhc/rhc$pNoRhc), : length(Tr) != nrow(X)
## Extract matched data
rhcMatched <- rhc[unlist(listMatch[c("index.treated","index.control")]), ]
Error in unlist(listMatch[c("index.treated", "index.control")]): object 'listMatch' not found
## Construct a table
tabMatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhcMatched, test = FALSE)
Error in is.data.frame(data): object 'rhcMatched' not found
## Show table with SMD
print(tabMatched, smd = TRUE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabMatched' not found
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabMatched) > 0.1))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabMatched' not found
The matching weight method is a weighting analogue to the 1:1 pairwise algorithmic matching (http://www.ncbi.nlm.nih.gov/pubmed/23902694 ). The matching weight is defined as the smaller of the predicted probabilities of receiving or not receiving the treatment over the predicted probability of being assigned to the arm the patient is actually in. After weighting, all the standardized mean differences are below 0.1. The standardized mean differences in weighted data are explained in http://onlinelibrary.wiley.com/doi/10.1002/sim.6607/full .
## Matching weight
rhc$mw <- rhc$pMin / rhc$pAssign
Error in `$<-.data.frame`(`*tmp*`, mw, value = numeric(0)): replacement has 0 rows, data has 5735
## Weighted data
rhcSvy <- svydesign(ids = ~ 1, data = rhc, weights = ~ mw)
Error in eval(predvars, data, env): object 'mw' not found
## Construct a table (This is a bit slow.)
tabWeighted <- svyCreateTableOne(vars = vars, strata = "swang1", data = rhcSvy, test = FALSE)
Error in c("svyrep.design", "survey.design2", "survey.design") %in% class(data): object 'rhcSvy' not found
## Show table with SMD
print(tabWeighted, smd = TRUE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabWeighted' not found
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabWeighted) > 0.1))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabWeighted' not found
A plot showing covariate balance is often constructed to demonstrate the balancing effect of matching and/or weighting. Given the same propensity score model, the matching weight method often achieves better covariate balance than matching.
## Construct a data frame containing variable name and SMD from all methods
dataPlot <- data.frame(variable = names(ExtractSmd(tabUnmatched)),
Unmatched = ExtractSmd(tabUnmatched),
Matched = ExtractSmd(tabMatched),
Weighted = ExtractSmd(tabWeighted))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabMatched' not found
## Create long-format data for ggplot2
dataPlotMelt <- melt(data = dataPlot,
id.vars = c("variable"),
variable.name = "Method",
value.name = "SMD")
Error in melt(data = dataPlot, id.vars = c("variable"), variable.name = "Method", : object 'dataPlot' not found
## Order variable names by magnitude of SMD
varNames <- as.character(dataPlot$variable)[order(dataPlot$Unmatched)]
Error in eval(expr, envir, enclos): object 'dataPlot' not found
## Order factor levels in the same order
dataPlotMelt$variable <- factor(dataPlotMelt$variable,
levels = varNames)
Error in factor(dataPlotMelt$variable, levels = varNames): object 'dataPlotMelt' not found
## Plot using ggplot2
ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD,
group = Method, color = Method)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
coord_flip() +
theme_bw() + theme(legend.key = element_blank())
Error in ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD, : object 'dataPlotMelt' not found
To construct a side-by-side table, data can be extracted as a matrix and combined using the print() method, which actually invisibly returns a matrix.
## Column bind tables
resCombo <- cbind(print(tabUnmatched, printToggle = FALSE),
print(tabMatched, printToggle = FALSE),
print(tabWeighted, printToggle = FALSE))
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabMatched' not found
## Add group name row, and rewrite column names
resCombo <- rbind(Group = rep(c("No RHC","RHC"), 3), resCombo)
Error in rbind(Group = rep(c("No RHC", "RHC"), 3), resCombo): object 'resCombo' not found
colnames(resCombo) <- c("Unmatched","","Matched","","Weighted","")
Error in colnames(resCombo) <- c("Unmatched", "", "Matched", "", "Weighted", : object 'resCombo' not found
print(resCombo, quote = FALSE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'resCombo' not found
The final analysis can be conducted using matched and weighted data. The results from the matching and matching weight are similar. ShowRegTable() function may come in handly.
## Unmatched model (unadjsuted)
glmUnmatched <- glm(formula = (death == "Yes") ~ swang1,
family = binomial(link = "logit"),
data = rhc)
## Matched model
glmMatched <- glm(formula = (death == "Yes") ~ swang1,
family = binomial(link = "logit"),
data = rhcMatched)
Error in is.data.frame(data): object 'rhcMatched' not found
## Weighted model
glmWeighted <- svyglm(formula = (death == "Yes") ~ swang1,
family = binomial(link = "logit"),
design = rhcSvy)
Error in .svycheck(design): object 'rhcSvy' not found
## Show results together
resTogether <- list(Unmatched = ShowRegTable(glmUnmatched, printToggle = FALSE),
Matched = ShowRegTable(glmMatched, printToggle = FALSE),
Weighted = ShowRegTable(glmWeighted, printToggle = FALSE))
Error in class(model) %in% c("lme"): object 'glmMatched' not found
print(resTogether, quote = FALSE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'resTogether' not found