# Mosaic plots

#### 2022-04-21

Mosaic plots provide an ideal method both for visualizing contingency tables and for visualizing the fit— or more importantly— lack of fit of a loglinear model. For a two-way table, mosaic() fits a model of independence, $$[A][B]$$ or ~A+B as an R formula. For $$n$$-way tables, mosaic() can fit any loglinear model, and can also be used to plot a model fit with loglm(). See Friendly (1994),vcd:Friendly:1999 for the statistical ideas behind these uses of mosaic displays in connection with loglinear models.

The essential idea is to recursively sub-divide a unit square into rectangular “tiles” for the cells of the table, such that the are area of each tile is proportional to the cell frequency. For a given loglinear model, the tiles can then be shaded in various ways to reflect the residuals (lack of fit) for a given model. The pattern of residuals can then be used to suggest a better model or understand where a given model fits or does not fit.

mosaic() provides a wide range of options for the directions of splitting, the specification of shading, labeling, spacing, legend and many other details. It is actually implemented as a special case of a more general class of displays for $$n$$-way tables called strucplot, including sieve diagrams, association plots, double-decker plots as well as mosaic plots. For details, see help(strucplot) and the “See also” links, and also Meyer, Zeileis, & Hornik (2006), which is available as an R vignette via vignette("strucplot", package="vcd").

Example: A mosaic plot for the Arthritis treatment data fits the model of independence, ~ Treatment + Improved and displays the association in the pattern of residual shading. The plot below is produced with the following call to mosaic().

data(Arthritis, package="vcd")
art <- xtabs(~Treatment + Improved, data = Arthritis)
split_vertical = TRUE,
main="Arthritis: [Treatment] [Improved]")

gp = shading_max specifies that color in the plot signals a significant residual at a 90% or 99% significance level, with the more intense shade for 99%. Note that the residuals for the independence model were not large (as shown in the legend), yet the association between Treatment and Improved is highly significant.

summary(art)
## Call: xtabs(formula = ~Treatment + Improved, data = Arthritis)
## Number of cases in table: 84
## Number of factors: 2
## Test for independence of all factors:
##  Chisq = 13.055, df = 2, p-value = 0.001463

In contrast, one of the other shading schemes, from Friendly (1994) (use: gp = shading_Friendly), uses fixed cutoffs of $$\pm 2, \pm 4$$, to shade cells which are individually significant at approximately $$\alpha = 0.05$$ and $$\alpha = 0.001$$ levels, respectively. The right panel below uses gp = shading_Friendly.

mosaic(art, gp = shading_Friendly,
split_vertical = TRUE,
main="Arthritis: gp = shading_Friendly")

Friendly, M. (1994). Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89, 190–200.
Meyer, D., Zeileis, A., & Hornik, K. (2006). The strucplot framework: Visualizing multi-way contingency tables with. Journal of Statistical Software, 17(3), 1–48. Retrieved from https://www.jstatsoft.org/v17/i03/