fuzzydid — Estimation with Fuzzy Difference-in-Difference Designs
Install the development version from GitHub with:
install.packages("remotes")
remotes::install_github("kmfrick/Rfuzzydid")Full documentation and worked examples are available at https://kmfrick.github.io/Rfuzzydid/.
fuzzydid(
data,
formula,
group,
time,
group_forward = NULL,
did = FALSE,
tc = FALSE,
cic = FALSE,
lqte = FALSE,
newcateg = NULL,
numerator = FALSE,
partial = FALSE,
nose = FALSE,
cluster = NULL,
breps = 50,
eqtest = FALSE,
modelx = NULL,
sieves = FALSE,
sieveorder = NULL,
tagobs = FALSE,
backend = c("auto", "native"),
seed = NULL,
treatment = NULL
)fuzzydid() computes estimators of local average and
quantile treatment effects in fuzzy DID designs, following de
Chaisemartin and D’Haultfoeuille (2018a). It also computes their
standard errors and confidence intervals.
Rfuzzydid is an R port of the Stata
fuzzydid package. Its development aim is feature parity
with the Stata package while exposing the estimators through a
formula-first R interface.
Rfuzzydid is a maturing R implementation of the
estimators introduced by de Chaisemartin and D’Haultfoeuille (2018a) and
implemented for Stata by de Chaisemartin, D’Haultfoeuille, and
Guyonvarch (2018b). New development is focused on native R parity, input
validation, and review-ready documentation rather than adding estimators
beyond those references.
Arguments:
data: Data frame containing all variables.formula: A formula of the form y ~ d or
y ~ d + x1 + x2, where y is the outcome
variable and RHS terms include treatment plus optional covariates.treatment: Optional treatment variable name for
multi-term formulas. If omitted, treatment is inferred when unambiguous
(single RHS term, or a unique d term).group: Name of the group variable (backward group for
multi-period designs). See Section 4.2 of de Chaisemartin et al. (2018b;
doi:10.1177/1536867X19854019) for details on constructing this
variable.time: Name of the time period variable.group_forward: Optional name of the forward group
variable for multi-period designs.A detailed introduction to the methodology is given in de Chaisemartin et al. (2018b; doi:10.1177/1536867X19854019).
y, d, group,
time, and group_forward must be numeric
vectors. Numeric covariates are treated as continuous; factor,
character, and logical covariates are treated as qualitative predictors.
NA and NaN values are removed by complete-case
filtering over all analysis variables. Inf and
-Inf are rejected. Use tagobs = TRUE to
recover the retained-row mask.
Estimators:
did: Logical; computes the Wald-DID estimator.tc: Logical; computes the Wald-TC estimator.cic: Logical; computes the Wald-CIC estimator. Only
available when no covariates are included.lqte: Logical; computes estimators of the LQTE for
quantiles of order 5%, 10%, …, 95%. Only available when D, G, and T are
binary, and no covariates are included.At least one of did, tc, cic,
or lqte must be specified. If several are specified, all
requested estimators are computed.
Treatment categorization:
newcateg: Numeric vector of upper bounds to group
treatment values together for Wald-TC and Wald-CIC. Useful when
treatment takes many values. See Section 3.3 of de Chaisemartin et
al. (2018b; doi:10.1177/1536867X19854019).Numerators and bounds:
numerator: Logical; return only the numerators of
Wald-DID, Wald-TC, and Wald-CIC estimators. Useful for placebo tests
(see Section 3.3.3 of the supplement of de Chaisemartin and
D’Haultfoeuille 2018a).partial: Logical; compute bounds on local average
treatment effects in the absence of a “stable” control group. Only
available without covariates.Inference:
nose: Logical; compute only point estimates, not
standard errors.cluster: Name of cluster variable for block bootstrap.
Only one clustering variable is allowed.breps: Integer number of bootstrap replications.
Default is 50.eqtest: Logical; perform equality tests between
estimands when at least two of did, tc,
cic are specified.Covariates:
modelx: Character vector specifying parametric methods
for estimating conditional expectations in Wald-DID and Wald-TC with
covariates. Two entries required for binary treatments; three for
ordered multi-valued treatments. Values must be "ols",
"logit", or "probit".sieves: Logical; use nonparametric sieve estimation for
conditional expectations.sieveorder: Optional sieve order control when
sieves = TRUE. Default NULL selects order by
deterministic 5-fold CV. A scalar applies to both outcome and treatment
sieve bases. A length-2 vector (outcome, treatment) is
supported for backward compatibility. Values must be ≥ 2 and satisfy the
basis cap min(4800, floor(n/5)).When covariates are included and neither modelx nor
sieves is specified, all conditional expectations are
estimated by OLS by default.
Other:
tagobs: Logical; return a logical mask of observations
used by fuzzydid().backend: One of "auto" or
"native".seed: Optional integer seed for bootstrap resampling
when nose = FALSE. The default NULL uses the
current R RNG state; supply a seed for reproducible standard errors,
confidence intervals, and bootstrap diagnostics.fuzzydid objects support print(),
summary(), coef(), confint(),
nobs(), formula(), vcov(),
plot(), generics::tidy(), and
generics::glance(). They do not implement
predict(), fitted(), or
residuals() because the object summarizes causal estimands
rather than observation-level fitted outcomes.
An object of class "fuzzydid" containing:
Data frames:
late: LATE estimates with columns:
estimator, estimate, std.error,
conf.low, conf.higheqtest: Equality test results (if
eqtest = TRUE)lqte: LQTE estimates at quantiles 0.05, 0.10, …, 0.95
(if lqte = TRUE)Matrices (Stata-parity):
matrices$b_LATE: k × 1 matrix of requested
estimatorsmatrices$se_LATE: k × 1 matrix of bootstrap standard
errorsmatrices$ci_LATE: k × 2 matrix of 95% percentile
bootstrap confidence intervalsmatrices$b_LQTE: 19 × 1 matrix of LQTE estimates at
quantiles 0.05–0.95matrices$se_LQTE: 19 × 1 matrix of LQTE bootstrap
standard errorsmatrices$ci_LQTE: 19 × 2 matrix of LQTE 95% confidence
intervalsCounts:
n: Number of observations usedn11, n10, n01,
n00: Cell sizes for (G,T) combinationsn_reps: Number of bootstrap replications requestedn_misreps: Number of failed/degenerate bootstrap
replicationsshare_failures: Proportion of failed replications
# Generate simulated data (saved to CSV for R/Stata parity verification)
set.seed(50321)
n_cell <- 80
df <- rbind(
data.frame(y = rnorm(n_cell, 1 + 1.8 * rbinom(n_cell, 1, 0.20)), g = 0, t = 0, d = rbinom(n_cell, 1, 0.20)),
data.frame(y = rnorm(n_cell, 1 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.35)), g = 0, t = 1, d = rbinom(n_cell, 1, 0.35)),
data.frame(y = rnorm(n_cell, 1 + 0.7 + 1.8 * rbinom(n_cell, 1, 0.30)), g = 1, t = 0, d = rbinom(n_cell, 1, 0.30)),
data.frame(y = rnorm(n_cell, 1 + 0.7 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.70)), g = 1, t = 1, d = rbinom(n_cell, 1, 0.70))
)
# Save for Stata comparison
write.csv(df, "fuzzydid_example.csv", row.names = FALSE)library(Rfuzzydid)
df <- read.csv("fuzzydid_example.csv")
fit <- fuzzydid(
data = df,
formula = y ~ d,
group = "g",
time = "t",
did = TRUE,
tc = TRUE,
cic = TRUE,
breps = 50
)
summary(fit)import delimited "fuzzydid_example.csv", clear
fuzzydid y g t d, did tc cic breps(50)Note: The Stata command is shown for
parity/reference only. Rfuzzydid does not bundle the Stata
fuzzydid sources, so Stata users need that command
installed separately in their Stata environment.
Point estimates from R and Stata will be identical for the covered parity fixtures, but bootstrap confidence intervals can differ due to RNG differences between the two platforms. Results remain comparable across implementations.
de Chaisemartin, C. and D’Haultfoeuille, X. 2018a. Fuzzy Differences-in-Differences. Review of Economic Studies, 85(2): 999-1028. doi:10.1093/restud/rdx049.
de Chaisemartin, C., D’Haultfoeuille, X., and Guyonvarch, Y. 2018b. Fuzzy Differences-in-Differences with Stata. Stata Journal. doi:10.1177/1536867X19854019.
AGPL-3.0