fixes

R-CMD-check CRAN status

Overview

Note
The fixes package currently supports data with annual time intervals only.
For datasets with finer time intervals, such as monthly or quarterly data, I recommend creating a new column with sequential time numbers (e.g., 1, 2, 3, …) representing the time order.
This column can then be used for analysis.

The fixes package is designed for conducting analysis and creating plots for event studies, a method used to verify the parallel trends assumption in two-way fixed effects (TWFE) difference-in-differences (DID) analysis.

The package includes two main functions:

  1. run_es(): Accepts a data frame, generates lead and lag variables, and performs event study analysis. The function returns the results as a data frame.
  2. plot_es(): Creates plots using ggplot2 based on the data frame generated by run_es(). Users can choose between a plot with geom_ribbon() or geom_errorbar() to visualize the results.

Installation

You can install the package like so:

# install.packages("pak")
pak::pak("fixes")

or

install.packages("fixes")

If you want to install development version, please install from GitHub repository:

pak::pak("yo5uke/fixes")

How to use

First, load the library.

library(fixes)

Data frame

The data frame to be analyzed must include the following variables:

  1. A variable to identify individuals.
  2. A dummy variable indicating treated individuals (e.g., is_treated).
  3. A variable representing time (e.g., year).
  4. An outcome variable.

For example, a data frame like the following:

firm_id state_id year is_treated y
1 21 1980 1 0.8342158
1 21 1981 1 -0.5354355
1 21 1982 1 1.1372828
1 21 1983 1 0.7339165
1 21 1984 1 1.4232840
1 21 1985 1 1.2783362

run_es()

run_es() takes 11 arguments, including required variables and optional specifications like covariates and clustering.

Argument Description
data Data frame to be used.
outcome Outcome variable. Can be specified as a raw variable or a transformation (e.g., log(y)). Provide it unquoted.
treatment Dummy variable indicating the treated units. Provide it unquoted. Accepts both 0/1 and TRUE/FALSE.
time Time variable. Provide it unquoted.
timing Time value indicating when the treatment occurs.
lead_range Number of pre-treatment periods to include (e.g., 3 = lead3, lead2, lead1).
lag_range Number of post-treatment periods to include (e.g., 2 = lag0, lag1, lag2).
covariates Additional covariates to include in the regression. Must be a one-sided formula (e.g., ~ x1 + x2).
fe Fixed effects to control for unobserved heterogeneity. Must be a one-sided formula (e.g., ~ id + year).
cluster Specifies clustering for standard errors. Can be a character vector (e.g., c("id", "year")) or a formula (e.g., ~ id + year, ~ id^year).
baseline Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. Must be within the specified lead/lag range.
interval Time interval between observations (e.g., 1 for yearly data, 5 for 5-year intervals).

Example: Without Covariates

event_study <- run_es(
  data       = df, 
  outcome    = y, 
  treatment  = is_treated, 
  time       = year, 
  timing     = 1998, 
  lead_range = 5, 
  lag_range  = 5, 
  fe         = ~ firm_id + year, 
  cluster    = ~ state_id, 
  baseline   = -1, 
  interval   = 1
)

Note: The fe argument must be specified as a one-sided formula (e.g., ~ firm_id + year).
The cluster argument can be specified either as a one-sided formula (e.g., ~ state_id) or as a character vector (e.g., c("firm_id", "year")).

The run_es() function returns a tidy data frame with estimated event-study coefficients, confidence intervals, and metadata such as relative timing and baseline identification1.

Example: With Covariates

event_study <- run_es(
  data       = df, 
  outcome    = y, 
  treatment  = is_treated, 
  time       = year, 
  timing     = 1998, 
  lead_range = 5, 
  lag_range  = 5, 
  covariates = ~ cov1 + cov2 + cov3, 
  fe         = ~ firm_id + year, 
  cluster    = ~ state_id, 
  baseline   = -1, 
  interval   = 1
)

You can use this result to create custom plots, or take advantage of the built-in plot_es() function to visualize the estimates and confidence intervals with minimal code.

plot_es()

The plot_es() function creates a plot based on ggplot2.

plot_es() has 12 arguments.

Arguments Description
data Data frame created by run_es()
type The type of confidence interval visualization: “ribbon” (default) or “errorbar”
vline_val The x-intercept for the vertical reference line (default: 0)
vline_color Color for the vertical reference line (default: “#000”)
hline_val The y-intercept for the horizontal reference line (default: 0)
hline_color Color for the horizontal reference line (default: “#000”)
linewidth The width of the lines for the plot (default: 1)
pointsize The size of the points for the estimates (default: 2)
alpha The transparency level for ribbons (default: 0.2)
barwidth The width of the error bars (default: 0.2)
color The color for the lines and points (default: “#B25D91FF”)
fill The fill color for ribbons (default: “#B25D91FF”).

If you don’t care about the details, you can just pass the data frame created with run_es() and the plot will be complete.

plot_es(event_study)

plot_es(event_study, type = "errorbar")

plot_es(event_study, type = "errorbar", vline_val = -.5)

Since it is created on a ggplot2 basis, it is possible to modify minor details.

plot_es(event_study, type = "errorbar") + 
  ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) + 
  ggplot2::ggtitle("Result of Event Study")

Planned Features

Debugging

If you find an issue, please report it on the GitHub Issues page.


  1. Behind the scenes, estimation is performed using fixest::feols().↩︎