Prediction Power Heatmaps

The function make_pred_plot() visualizes the output from prediction_power() as a heatmap. Each cell shows an expected conditional entropy value, where lower values indicate stronger prediction power. Diagonal entries correspond to prediction using a single predictor, while off-diagonal entries correspond to prediction using pairs of predictors.

library(netropy)

We first edit the node attributes so that all variables have finite categorical range spaces. The variables years and age are discretized into three categories.

df_att <- lawdata[[4]]
att_var <- data.frame(
  status    = df_att$status - 1,
  gender    = df_att$gender,
  office    = df_att$office - 1,
  years     = ifelse(df_att$years <= 3, 0,
                ifelse(df_att$years <= 13, 1, 2)),
  age       = ifelse(df_att$age <= 35, 0,
                ifelse(df_att$age <= 45, 1, 2)),
  practice  = df_att$practice,
  lawschool = df_att$lawschool - 1
)

The first rows of the edited attribute data are:

head(att_var)
##   status gender office years age practice lawschool
## 1      0      1      0     2   2        1         0
## 2      0      1      0     2   2        0         0
## 3      0      1      1     1   2        1         0
## 4      0      1      0     2   2        0         2
## 5      0      1      1     2   2        1         1
## 6      0      1      1     2   2        1         0

Prediction Power

Assume we are interested in predicting status, which indicates whether a lawyer is an associate or a partner. We first compute the prediction power matrix:

pred_status <- prediction_power("status", att_var)
pred_status
##           status gender office years   age practice lawschool
## status        NA     NA     NA    NA    NA       NA        NA
## gender        NA  0.695  0.818 0.404 0.514    0.871     0.800
## office        NA     NA  1.084 0.302 0.526    0.944     0.841
## years         NA     NA     NA 0.927 0.329    0.406     0.322
## age           NA     NA     NA    NA 1.007    0.683     0.617
## practice      NA     NA     NA    NA    NA    1.226     0.916
## lawschool     NA     NA     NA    NA    NA       NA     1.693

Heatmap Visualization

The matrix can be visualized using make_pred_plot():

make_pred_plot(pred_status, "Prediction Power for Status")

Darker cells indicate lower expected conditional entropy and therefore stronger prediction power. The diagonal entries show prediction based on one variable, while the off-diagonal entries show prediction based on pairs of variables.

Changing Plot Colors

The colors can be adjusted using the low and high arguments. For example:

make_pred_plot(
  pred_status,
  "Prediction Power for Status",
  low = "steelblue",
  high = "white"
)

Changing Text Size

The size of the cell labels can be controlled with text_size:

make_pred_plot(
  pred_status,
  "Prediction Power for Status",
  text_size = 6
)

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63. link