This document gives an overview of the functionality provided by the
R package APCtools
.
Age-Period-Cohort (APC) analysis is used to disentangle observed trends (e.g. of social, economic, medical or epidemiological data) to enable conclusions about the developments over three temporal dimensions:
The critical challenge in APC analysis is that these main components are linearly dependent: \[ cohort = period - age \]
Accordingly, flexible methods and visualization techniques are needed
to properly disentagle observed temporal association structures. The
APCtools
package comprises different methods that tackle
this problem and aims to cover all steps of an APC analysis. This
includes state-of-the-art descriptive visualizations as well as
visualization and summary functions based on the estimation of a
generalized additive regression model (GAM). The main functionalities of
the package are highlighted in the following.
For details on the statistical methodology see Weigert et al. (2021) or our corresponding research poster. The hexamaps (hexagonally binned heatmaps) are outlined in Jalal & Burke (2020).
Before we start, let’s load the relevant packages for the following analyses.
library(APCtools)
library(dplyr) # general data handling
library(mgcv) # estimation of generalized additive regression models (GAMs)
library(ggplot2) # data visualization
library(ggpubr) # arranging multiple ggplots in a grid with ggarrange()
# set the global theme of all plots
theme_set(theme_minimal())
APC analyses require long-term panel or repeated cross-sectional
data. The package includes two exemplary datasets on the travel behavior
of German tourists (dataset travel
) and the number of
unintentional drug overdose deaths in the United States
(drug_deaths
). See the respective help pages
?travel
and ?drug_deaths
for details.
In the following, we will use the travel
dataset to
investigate if travel distances of the main trip of German travelers
mainly change over the life cycle of a person (age effect), macro-level
developments like decreasing air travel prices in the last decades
(period effect) or the generational membership of a person, which is
shaped by similar socialization and historical experiences (cohort
effect).
data(travel)
Different functions are available for descriptively visualizing observed structures. This includes plots for the marginal distribution of some variable of interest, 1D plots for the development of some variable over age, period or cohort, as well as density matrices that visualize the development over all temporal dimensions.
The marginal distribution of a variable can be visualized using
plot_density
. Metric variables can be plotted using a
density plot or a boxplot, while categorical variables can be plotted
using a bar chart.
gg1 <- plot_density(dat = travel, y_var = "mainTrip_distance", log_scale = TRUE)
gg2 <- plot_density(dat = travel, y_var = "mainTrip_distance", log_scale = TRUE,
plot_type = "boxplot")
gg3 <- plot_density(dat = travel, y_var = "household_size")
ggpubr::ggarrange(gg1, gg2, gg3, nrow = 1)
Plotting the distribution of a variable against age, period or cohort
is possible with function plot_variable
. The distribution
of metric and categorical variables is visualized using boxplots or line
charts (see argument plot_type
) and bar charts,
respectively. The latter by default show relative frequencies, but can
be changed to show absolute numbers by specifying argument
geomBar_position = "stack"
.
plot_variable(dat = travel, y_var = "mainTrip_distance",
apc_dimension = "period", plot_type = "line", ylim = c(0,1000))
plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period")
To include all temporal dimensions in one plot, APCtools
contains function plot_densityMatrix
. In Weigert et
al. (2021), this plot type was referred to as ridgeline matrix
when plotting multiple density plots for a metric variable. The basic
principle of a density matrix is to (i) visualize two of the temporal
dimensions on the x- and y-axis (specified using the argument
dimensions
), s.t. the third temporal dimension is
represented on the diagonals of the matrix, and (ii) to categorize the
respective variables on the x- and y-axis in meaningful groups. The
function then creates a grid, where each cell contains the distribution
of the selected y_var
variable in the respective
category.
By default, age and period are depicted on the x- and y-axis,
respectively, and cohort on the diagonals. The categorization is defined
by specifying two of the arguments age_groups
,
period_groups
and cohort_groups
.
age_groups <- list(c(80,89),c(70,79),c(60,69),c(50,59),
c(40,49),c(30,39),c(20,29))
period_groups <- list(c(1970,1979),c(1980,1989),c(1990,1999),
c(2000,2009),c(2010,2019))
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
age_groups = age_groups,
period_groups = period_groups,
log_scale = TRUE)
To highlight the effect of the variable depicted on the diagonal
(here: cohort), different diagonals can be highlighted using argument
highlight_diagonals
.
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
age_groups = age_groups,
period_groups = period_groups,
highlight_diagonals = list("born 1950 - 1959" = 8,
"born 1970 - 1979" = 10),
log_scale = TRUE)