Introduction

When working with Generalized Linear Models it is often useful to create informative and beautiful summaries of the fitted model coefficients. The goal of prettyglm is to provide a set of functions to visualize the Generalized Linear Models coefficients and performance in interactive plots which can easily be embedded in rmarkdown reports or separately exported and shared with stakeholders. This document introduces prettyglm’s main sets of functions, and shows you how to apply them.

Please see the website prettyglm for more detailed documentation with html outputs, some of the outputs have been excluded from this documentation for publication on CRAN.

If you don’t find the function you are looking for in prettyglm consider checking out some other great packages which help visualize the output from glms:

  • tidycat

  • jtools

Installation

You can install the latest CRAN release with:

install.packages('prettyglm')

Important Pre-Processing

Model Building

To explore the functionality of prettyglm we will use the titanic data set to perform logistic regression. This data set was sourced from kaggle and contains information about passengers aboard the titanic, and a target variable which indicates if they survived.

library(dplyr)
library(prettyglm)
data('titanic')
head(titanic) %>%
  select(-c(PassengerId, Name, Ticket)) %>% 
  knitr::kable(table.attr = "style='width:10%;'" ) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Survived Pclass Sex Age SibSp Parch Fare Cabin Embarked Cabintype
0 3 male 22 1 0 7.2500 Missing S Missing
1 1 female 38 1 0 71.2833 C85 C C
1 3 female 26 0 0 7.9250 Missing S Missing
1 1 female 35 1 0 53.1000 C123 S C
0 3 male 35 0 0 8.0500 Missing S Missing
0 3 male NA 0 0 8.4583 Missing Q Missing

Pre-processing

A critical step for this package to work is to set all categorical predictors as factors.

# Easy way to convert multiple columns to a factor.
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin', 
                       'Embarked',
                       'Cabintype')
meanage <- base::mean(titanic$Age, na.rm=T)
titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==T,meanage,Age)) 

Building a glm

For this vignette we will use stats::glm() to build a logistic regression model. Currently working on support for parsnip and workflow model objects which use the glm model engine.

survival_model <- stats::glm(Survived ~ Pclass + 
                                        Sex + 
                                        Fare +
                                        Age +
                                        Embarked + 
                                        SibSp + 
                                        Parch, 
                             data = titanic, 
                             family = binomial(link = 'logit'))

pretty_coefficients()

Table of model coefficients

The function pretty_coefficients() allows you to create a pretty table of model coefficients, which by default includes categorical base levels.

Simple Example

The simplest way to call this function is just with the model object.

pretty_coefficients(model_object = survival_model)

Type III Significance Tests

You can also complete a type III test on the coefficients by specifying a type_iii argument. Warning Wald type III tests will fail if there are aliased coefficients in the model.

You can change the significance level highlighted in the table with significance_level.

pretty_coefficients(survival_model, type_iii = 'Wald', significance_level = 0.1)

Changing Variable Importance Method

By default pretty_coefficients shows “model” variable importance. But vimethod also accepts “permute” and “firm” methods from . Additional parameters for these methods should also be passed into pretty_coefficients.

pretty_coefficients(model_object = survival_model,
                    type_iii = 'Wald', 
                    significance_level = 0.1, 
                    vimethod = 'permute', 
                    target = 'Survived', 
                    metric = 'auc',
                    pred_wrapper = predict.glm, 
                    reference_class = 0)

pretty_relativities()

pretty_relativities() will create a plot of the desired model variable. A different plot will be generated depending on the class of the variable.

Plot coefficients

A model relativity is a transform of the model estimate. By default pretty_relativities() uses ‘exp(estimate)-1’ which is useful for GLM’s which use a log or logit link function.

The term ‘relativity’ is some times referred to as “odds-ratio” or “Likelihood”. You can customize the label with the relativity_label input.

Categorical Variables

For categorical variables pretty_relativities() creates an interactive duel axis plot, which plots the fitted relativity on one y axis, and the number of records in that category on the other y axis.

pretty_relativities(feature_to_plot= 'Embarked',
                    model_object = survival_model,
                    relativity_label = 'Liklihood of Survival'
                    )

Continuous Variables

For continuous variables pretty_relativities will plot the relativity over the variables range, and the density of that variable on a duel axis.

If desired you can cut off the tail end of the distributions with upper_percentile_to_cut or lower_percentile_to_cut.

pretty_relativities(feature_to_plot= 'Fare',
                    model_object = survival_model,
                    relativity_label = 'Liklihood of Survival',
                    upper_percentile_to_cut = 0.1)

Plot interactions

To highlight some more of prettyglm’s functionality we will now build a logistic regression model with some interactions.

Model Building

survival_model2 <- stats::glm(Survived ~ Pclass:Fare +
                                         Age +
                                         Embarked:Sex +
                                         SibSp +
                                         Parch,
                              data = titanic,
                              family = binomial(link = 'logit'))

Factor:Factor Interactions

Facet

You can also choose to facet the plots by one of the variables.

pretty_relativities(feature_to_plot= 'Embarked:Sex',
                    model_object = survival_model2,
                    relativity_label = 'Liklihood of Survival',
                    iteractionplottype = 'facet',
                    facetorcolourby = 'Sex'
                    )

Colour

You can also choose to colour the plots by one of the variables.

pretty_relativities(feature_to_plot= 'Embarked:Sex',
                    model_object = survival_model2,
                    relativity_label = 'Liklihood of Survival',
                    iteractionplottype = 'colour',
                    facetorcolourby = 'Embarked'
                    )

Standard

You can create these relativity plots as you would for a non-interaction.

pretty_relativities(feature_to_plot= 'Embarked:Sex',
                    model_object = survival_model2,
                    relativity_label = 'Liklihood of Survival'
                    )

Continuous:Factor Interactions

Colour

By default continuous and factor interaction plots will colour by the factor variable.

pretty_relativities(feature_to_plot= 'Pclass:Fare',
                    model_object = survival_model2,
                    relativity_label = 'Liklihood of Survival',
                    upper_percentile_to_cut = 0.03
                    )

Facet

You can also facet by the factor variable.

pretty_relativities(feature_to_plot= 'Pclass:Fare',
                    model_object = survival_model2,
                    relativity_label = 'Liklihood of Survival',
                    iteractionplottype = 'facet',
                    upper_percentile_to_cut = 0.03,
                    height = 800
                    )

Plot splines

To highlight some more of prettyglm’s functionality we will now build a logistic regression model with a spline.

Create the splines

prettyglm includes a function splineit to help construct splines. This can be incorporated in the dplyr workflow as follows.

For splines to work nicely in prettyglm use the naming convention Variable#Start#End where # represents your desired separator.

titanic  <- titanic  %>%
  dplyr::mutate(Age_0_18 = prettyglm::splineit(Age,0,18),
                Age_18_35 = prettyglm::splineit(Age,18,35),
                Age_35_120 = prettyglm::splineit(Age,35,120)) %>%
  dplyr::mutate(Fare_0_55 = prettyglm::splineit(Fare,0,55),
                Fare_55_600 = prettyglm::splineit(Fare,55,600))

Fit Model With Splines

survival_model4 <- stats::glm(Survived ~ Pclass +
                                         Sex:Fare_0_55 +
                                         Sex:Fare_55_600 +
                                         Age_0_18 +
                                         Age_18_35 +
                                         Age_35_120 +
                                         Embarked +
                                         SibSp +
                                         Parch,
                              data = titanic,
                              family = binomial(link = 'logit'))

Creating a table of model coefficients with pretty_coefficients

For interactions variables are grouped on the left pane.

pretty_coefficients(survival_model4, significance_level = 0.1, spline_seperator = '_')

Create plots of fitted coefficients using pretty_relativities

Splines

You also need to provide a spline_seperator input in pretty_relativities.

pretty_relativities(feature_to_plot= 'Age',
                    model_object = survival_model4,
                    relativity_label = 'Liklihood of Survival',
                    spline_seperator = '_'
                    )

Interacted Splines

Colour

By default pretty_relativities will colour by the factor variable.

pretty_relativities(feature_to_plot= 'Sex:Fare',
                    model_object = survival_model4,
                    relativity_label = 'Liklihood of Survival',
                    spline_seperator = '_',
                    upper_percentile_to_cut = 0.03
                    )
Facet

If you prefer to facet by the factor variable, change iteractionplottype to “facet”

pretty_relativities(feature_to_plot= 'Sex:Fare',
                    model_object = survival_model4,
                    relativity_label = 'Liklihood of Survival',
                    spline_seperator = '_',
                    upper_percentile_to_cut = 0.03,
                    iteractionplottype = 'facet'
                    )

one_way_ave()

Visualising one-way performance

Continuous Variable

For continuous variables one_way_ave will bucket value into 30 buckets by default, and plot the density on a dual axis.

one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model4,
            target_variable = 'Survived',
            data_set = titanic,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1)

Discrete Variable

one_way_ave(feature_to_plot = 'Cabintype',
            model_object = survival_model4,
            target_variable = 'Survived',
            data_set = titanic)

Customising one-way performance plots

Faceting

You can facet the one_way_ave plot by providing a variable to facet by in facetby.

one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model4,
            target_variable = 'Survived',
            facetby = 'Sex',
            data_set = titanic,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1)

Custom Predict Function

By default one_way_ave uses . If you would like to use one_way_ave with another model type (which is not compatible with predict.glm), or provide modified predictions, one_way_ave allows a custom prediction function.

This function must return a data.frame with two columns: “Actual_Values” and “Predicted_Values”.

# Custom Predict Function and facet
a_custom_predict_function <- function(target, model_object, dataset){
  dataset <- base::as.data.frame(dataset)
  Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target))))
  if(class(Actual_Values) == 'factor'){
    Actual_Values <- base::as.numeric(as.character(Actual_Values))
  }
  Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response'))

  to_return <-  base::data.frame(Actual_Values = Actual_Values,
                                 Predicted_Values = Predicted_Values)

  to_return <- to_return %>%
    dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.4,0.4,Predicted_Values))
  return(to_return)
}

one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model4,
            target_variable = 'Survived',
            data_set = titanic,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1,
            predict_function = a_custom_predict_function)

actual_expected_bucketed()

Actual vs Expected Bucketed By Prediction Percentile

Standard

actual_expected_bucketed(target_variable = 'Survived',
                         model_object = survival_model4,
                         data_set = titanic)

Faceted

actual_expected_bucketed(target_variable = 'Survived',
                         model_object = survival_model4,
                         data_set = titanic, 
                         facetby = 'Sex')