Standardized mean difference

The standardized (mean) difference is a measure of distance between two group means in terms of one or more variables. In practice it is often used as a balance measure of individual covariates before and after propensity score matching. As it is standardized, comparison across variables on different scales is possible. For definitions see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title .

Standardized mean differences can be easily calculated with tableone. All standardized mean differences in this package are absolute values, thus, there is no directionality.

Load packages

## tableone package itself
library(tableone)
## PS matching
library(Matching)
## Weighted analysis
library(survey)
## Reorganizing data
library(reshape2)
## plotting
library(ggplot2)

Load data

The right heart catheterization dataset is available at http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets . This dataset was originally used in Connors et al. JAMA 1996;276:889-897, and has been made publicly available.

## Right heart cath dataset
rhc <- read.csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/rhc.csv")

Unmatched table

Out of the 50 covariates, 32 have standardized mean differences of greater than 0.1, which is often considered the sign of important covariate imbalance (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title ).

## Covariates
vars <- c("age","sex","race","edu","income","ninsclas","cat1","das2d3pc","dnr1",
          "ca","surv2md1","aps1","scoma1","wtkilo1","temp1","meanbp1","resp1",
          "hrt1","pafi1","paco21","ph1","wblc1","hema1","sod1","pot1","crea1",
          "bili1","alb1","resp","card","neuro","gastr","renal","meta","hema",
          "seps","trauma","ortho","cardiohx","chfhx","dementhx","psychhx",
          "chrpulhx","renalhx","liverhx","gibledhx","malighx","immunhx",
          "transhx","amihx")

## Construct a table
tabUnmatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhc, test = FALSE)
## Show table with SMD
print(tabUnmatched, smd = TRUE)
                        Stratified by swang1
                         No RHC          RHC             SMD   
  n                        3551            2184                
  age (mean (SD))         61.76 (17.29)   60.75 (15.63)   0.061
  sex = Male (%)           1914 (53.9)     1278 (58.5)    0.093
  race (%)                                                0.036
     black                  585 (16.5)      335 (15.3)         
     other                  213 ( 6.0)      142 ( 6.5)         
     white                 2753 (77.5)     1707 (78.2)         
  edu (mean (SD))         11.57 (3.13)    11.86 (3.16)    0.091
  income (%)                                              0.142
     $11-$25k               713 (20.1)      452 (20.7)         
     $25-$50k               500 (14.1)      393 (18.0)         
     > $50k                 257 ( 7.2)      194 ( 8.9)         
     Under $11k            2081 (58.6)     1145 (52.4)         
  ninsclas (%)                                            0.194
     Medicaid               454 (12.8)      193 ( 8.8)         
     Medicare               947 (26.7)      511 (23.4)         
     Medicare & Medicaid    251 ( 7.1)      123 ( 5.6)         
     No insurance           186 ( 5.2)      136 ( 6.2)         
     Private                967 (27.2)      731 (33.5)         
     Private & Medicare     746 (21.0)      490 (22.4)         
  cat1 (%)                                                0.583
     ARF                   1581 (44.5)      909 (41.6)         
     CHF                    247 ( 7.0)      209 ( 9.6)         
     COPD                   399 (11.2)       58 ( 2.7)         
     Cirrhosis              175 ( 4.9)       49 ( 2.2)         
     Colon Cancer             6 ( 0.2)        1 ( 0.0)         
     Coma                   341 ( 9.6)       95 ( 4.3)         
     Lung Cancer             34 ( 1.0)        5 ( 0.2)         
     MOSF w/Malignancy      241 ( 6.8)      158 ( 7.2)         
     MOSF w/Sepsis          527 (14.8)      700 (32.1)         
  das2d3pc (mean (SD))    20.37 (5.48)    20.70 (5.03)    0.063
  dnr1 = Yes (%)            499 (14.1)      155 ( 7.1)    0.228
  ca (%)                                                  0.107
     Metastatic             261 ( 7.4)      123 ( 5.6)         
     No                    2652 (74.7)     1727 (79.1)         
     Yes                    638 (18.0)      334 (15.3)         
  surv2md1 (mean (SD))     0.61 (0.19)     0.57 (0.20)    0.198
  aps1 (mean (SD))        50.93 (18.81)   60.74 (20.27)   0.501
  scoma1 (mean (SD))      22.25 (31.37)   18.97 (28.26)   0.110
  wtkilo1 (mean (SD))     65.04 (29.50)   72.36 (27.73)   0.256
  temp1 (mean (SD))       37.63 (1.74)    37.59 (1.83)    0.021
  meanbp1 (mean (SD))     84.87 (38.87)   68.20 (34.24)   0.455
  resp1 (mean (SD))       28.98 (13.95)   26.65 (14.17)   0.165
  hrt1 (mean (SD))       112.87 (40.94)  118.93 (41.47)   0.147
  pafi1 (mean (SD))      240.63 (116.66) 192.43 (105.54)  0.433
  paco21 (mean (SD))      39.95 (14.24)   36.79 (10.97)   0.249
  ph1 (mean (SD))          7.39 (0.11)     7.38 (0.11)    0.120
  wblc1 (mean (SD))       15.26 (11.41)   16.27 (12.55)   0.084
  hema1 (mean (SD))       32.70 (8.79)    30.51 (7.42)    0.269
  sod1 (mean (SD))       137.04 (7.68)   136.33 (7.60)    0.092
  pot1 (mean (SD))         4.08 (1.04)     4.05 (1.01)    0.027
  crea1 (mean (SD))        1.92 (2.03)     2.47 (2.05)    0.270
  bili1 (mean (SD))        2.00 (4.43)     2.71 (5.33)    0.145
  alb1 (mean (SD))         3.16 (0.67)     2.98 (0.93)    0.230
  resp = Yes (%)           1481 (41.7)      632 (28.9)    0.270
  card = Yes (%)           1007 (28.4)      924 (42.3)    0.295
  neuro = Yes (%)           575 (16.2)      118 ( 5.4)    0.353
  gastr = Yes (%)           522 (14.7)      420 (19.2)    0.121
  renal = Yes (%)           147 ( 4.1)      148 ( 6.8)    0.116
  meta = Yes (%)            172 ( 4.8)       93 ( 4.3)    0.028
  hema = Yes (%)            239 ( 6.7)      115 ( 5.3)    0.062
  seps = Yes (%)            515 (14.5)      516 (23.6)    0.234
  trauma = Yes (%)           18 ( 0.5)       34 ( 1.6)    0.104
  ortho = Yes (%)             3 ( 0.1)        4 ( 0.2)    0.027
  cardiohx (mean (SD))     0.16 (0.37)     0.20 (0.40)    0.116
  chfhx (mean (SD))        0.17 (0.37)     0.19 (0.40)    0.069
  dementhx (mean (SD))     0.12 (0.32)     0.07 (0.25)    0.163
  psychhx (mean (SD))      0.08 (0.27)     0.05 (0.21)    0.143
  chrpulhx (mean (SD))     0.22 (0.41)     0.14 (0.35)    0.192
  renalhx (mean (SD))      0.04 (0.20)     0.05 (0.21)    0.032
  liverhx (mean (SD))      0.07 (0.26)     0.06 (0.24)    0.049
  gibledhx (mean (SD))     0.04 (0.19)     0.02 (0.16)    0.070
  malighx (mean (SD))      0.25 (0.43)     0.20 (0.40)    0.101
  immunhx (mean (SD))      0.26 (0.44)     0.29 (0.45)    0.080
  transhx (mean (SD))      0.09 (0.29)     0.15 (0.36)    0.170
  amihx (mean (SD))        0.03 (0.17)     0.04 (0.20)    0.074
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabUnmatched) > 0.1))

FALSE  TRUE   Sum 
   18    32    50 

Propensity score estimation

Usually a logistic regression model is used to estimate individual propensity scores. The model here is taken from “How To Use Propensity Score Analysis” (http://www.mc.vanderbilt.edu/crc/workshop_files/2008-04-11.pdf ). Predicted probabilities of being assigned to right heart catherterization, being assigned no right heart catherterization, being assigned to the true assignment, as well as the smaller of the probabilities of being assigned to right heart catherterization or no right heart catherterization are calculated for later use in propensity score matching and weighting.

## Fit model
psModel <- glm(formula = swang1 ~ age + sex + race + edu + income + ninsclas +
                         cat1 + das2d3pc + dnr1 + ca + surv2md1 + aps1 + scoma1 +
                         wtkilo1 + temp1 + meanbp1 + resp1 + hrt1 + pafi1 +
                         paco21 + ph1 + wblc1 + hema1 + sod1 + pot1 + crea1 +
                         bili1 + alb1 + resp + card + neuro + gastr + renal +
                         meta + hema + seps + trauma + ortho + cardiohx + chfhx +
                         dementhx + psychhx + chrpulhx + renalhx + liverhx + gibledhx +
                         malighx + immunhx + transhx + amihx,
               family  = binomial(link = "logit"),
               data    = rhc)
Error in eval(family$initialize): y values must be 0 <= y <= 1
## Predicted probability of being assigned to RHC
rhc$pRhc <- predict(psModel, type = "response")
Error in predict(psModel, type = "response"): object 'psModel' not found
## Predicted probability of being assigned to no RHC
rhc$pNoRhc <- 1 - rhc$pRhc
Error in `$<-.data.frame`(`*tmp*`, pNoRhc, value = numeric(0)): replacement has 0 rows, data has 5735
## Predicted probability of being assigned to the
## treatment actually assigned (either RHC or no RHC)
rhc$pAssign <- NA
rhc$pAssign[rhc$swang1 == "RHC"]    <- rhc$pRhc[rhc$swang1   == "RHC"]
Error in rhc$pAssign[rhc$swang1 == "RHC"] <- rhc$pRhc[rhc$swang1 == "RHC"]: replacement has length zero
rhc$pAssign[rhc$swang1 == "No RHC"] <- rhc$pNoRhc[rhc$swang1 == "No RHC"]
Error in rhc$pAssign[rhc$swang1 == "No RHC"] <- rhc$pNoRhc[rhc$swang1 == : replacement has length zero
## Smaller of pRhc vs pNoRhc for matching weight
rhc$pMin <- pmin(rhc$pRhc, rhc$pNoRhc)
Error in `$<-.data.frame`(`*tmp*`, pMin, value = integer(0)): replacement has 0 rows, data has 5735

Propensity score matching

The Matching package can be used for propensity score matching. The logit of propensity score is often used as the matching scale, and the matchign caliper is often 0.2 \(\times\) SD(logit(PS)). See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s5title for suggestions. After matching, all the standardized mean differences are below 0.1.

listMatch <- Match(Tr       = (rhc$swang1 == "RHC"),      # Need to be in 0,1
                   ## logit of PS,i.e., log(PS/(1-PS)) as matching scale
                   X        = log(rhc$pRhc / rhc$pNoRhc),
                   ## 1:1 matching
                   M        = 1,
                   ## caliper = 0.2 * SD(logit(PS))
                   caliper  = 0.2,
                   replace  = FALSE,
                   ties     = TRUE,
                   version  = "fast")
Error in Match(Tr = (rhc$swang1 == "RHC"), X = log(rhc$pRhc/rhc$pNoRhc), : length(Tr) != nrow(X)
## Extract matched data
rhcMatched <- rhc[unlist(listMatch[c("index.treated","index.control")]), ]
Error in unlist(listMatch[c("index.treated", "index.control")]): object 'listMatch' not found
## Construct a table
tabMatched <- CreateTableOne(vars = vars, strata = "swang1", data = rhcMatched, test = FALSE)
Error in is.data.frame(data): object 'rhcMatched' not found
## Show table with SMD
print(tabMatched, smd = TRUE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabMatched' not found
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabMatched) > 0.1))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabMatched' not found

Propensity score matching weight

The matching weight method is a weighting analogue to the 1:1 pairwise algorithmic matching (http://www.ncbi.nlm.nih.gov/pubmed/23902694 ). The matching weight is defined as the smaller of the predicted probabilities of receiving or not receiving the treatment over the predicted probability of being assigned to the arm the patient is actually in. After weighting, all the standardized mean differences are below 0.1. The standardized mean differences in weighted data are explained in http://onlinelibrary.wiley.com/doi/10.1002/sim.6607/full .

## Matching weight
rhc$mw <- rhc$pMin / rhc$pAssign
Error in `$<-.data.frame`(`*tmp*`, mw, value = numeric(0)): replacement has 0 rows, data has 5735
## Weighted data
rhcSvy <- svydesign(ids = ~ 1, data = rhc, weights = ~ mw)
Error in eval(predvars, data, env): object 'mw' not found
## Construct a table (This is a bit slow.)
tabWeighted <- svyCreateTableOne(vars = vars, strata = "swang1", data = rhcSvy, test = FALSE)
Error in c("svyrep.design", "survey.design2", "survey.design") %in% class(data): object 'rhcSvy' not found
## Show table with SMD
print(tabWeighted, smd = TRUE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabWeighted' not found
## Count covariates with important imbalance
addmargins(table(ExtractSmd(tabWeighted) > 0.1))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabWeighted' not found

Assessing balance before and after matching/weighting

A plot showing covariate balance is often constructed to demonstrate the balancing effect of matching and/or weighting. Given the same propensity score model, the matching weight method often achieves better covariate balance than matching.

## Construct a data frame containing variable name and SMD from all methods
dataPlot <- data.frame(variable  = names(ExtractSmd(tabUnmatched)),
                       Unmatched = ExtractSmd(tabUnmatched),
                       Matched   = ExtractSmd(tabMatched),
                       Weighted  = ExtractSmd(tabWeighted))
Error in class(x)[1] %in% c("TableOne", "svyTableOne"): object 'tabMatched' not found
## Create long-format data for ggplot2
dataPlotMelt <- melt(data          = dataPlot,
                     id.vars       = c("variable"),
                     variable.name = "Method",
                     value.name    = "SMD")
Error in melt(data = dataPlot, id.vars = c("variable"), variable.name = "Method", : object 'dataPlot' not found
## Order variable names by magnitude of SMD
varNames <- as.character(dataPlot$variable)[order(dataPlot$Unmatched)]
Error in eval(expr, envir, enclos): object 'dataPlot' not found
## Order factor levels in the same order
dataPlotMelt$variable <- factor(dataPlotMelt$variable,
                                levels = varNames)
Error in factor(dataPlotMelt$variable, levels = varNames): object 'dataPlotMelt' not found
## Plot using ggplot2
ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD,
                                          group = Method, color = Method)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
coord_flip() +
theme_bw() + theme(legend.key = element_blank())
Error in ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD, : object 'dataPlotMelt' not found

To construct a side-by-side table, data can be extracted as a matrix and combined using the print() method, which actually invisibly returns a matrix.

## Column bind tables
resCombo <- cbind(print(tabUnmatched, printToggle = FALSE),
                  print(tabMatched,   printToggle = FALSE),
                  print(tabWeighted,  printToggle = FALSE))
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'tabMatched' not found
## Add group name row, and rewrite column names
resCombo <- rbind(Group = rep(c("No RHC","RHC"), 3), resCombo)
Error in rbind(Group = rep(c("No RHC", "RHC"), 3), resCombo): object 'resCombo' not found
colnames(resCombo) <- c("Unmatched","","Matched","","Weighted","")
Error in colnames(resCombo) <- c("Unmatched", "", "Matched", "", "Weighted", : object 'resCombo' not found
print(resCombo, quote = FALSE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'resCombo' not found

Outcome analysis

The final analysis can be conducted using matched and weighted data. The results from the matching and matching weight are similar. ShowRegTable() function may come in handly.

## Unmatched model (unadjsuted)
glmUnmatched <- glm(formula = (death == "Yes") ~ swang1,
                    family  = binomial(link = "logit"),
                    data    = rhc)
## Matched model
glmMatched <- glm(formula = (death == "Yes") ~ swang1,
                  family  = binomial(link = "logit"),
                  data    = rhcMatched)
Error in is.data.frame(data): object 'rhcMatched' not found
## Weighted model
glmWeighted <- svyglm(formula = (death == "Yes") ~ swang1,
                      family  = binomial(link = "logit"),
                      design    = rhcSvy)
Error in .svycheck(design): object 'rhcSvy' not found
## Show results together
resTogether <- list(Unmatched = ShowRegTable(glmUnmatched, printToggle = FALSE),
                    Matched   = ShowRegTable(glmMatched, printToggle = FALSE),
                    Weighted  = ShowRegTable(glmWeighted, printToggle = FALSE))
Error in class(model) %in% c("lme"): object 'glmMatched' not found
print(resTogether, quote = FALSE)
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'print': object 'resTogether' not found