---
title: "Functional Form Sensitivity in Nonlinear DiD"
author: "NonlinearDiD Package"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Functional Form Sensitivity in Nonlinear DiD}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 4, warning = FALSE, message = FALSE
)
library(NonlinearDiD)
library(ggplot2)
set.seed(101)
```

# Overview

This vignette explores the **functional form sensitivity** problem in
difference-in-differences with binary outcomes, following
Roth & Sant'Anna (2023).

## The Core Issue

Suppose two groups have baseline outcome probabilities:

- Control: P(Y = 1) = 0.30
- Treated: P(Y = 1) = 0.25

Both groups experience the same *additive* increase in probability: +0.10.

**On the probability scale**: The change is 0.10 for both — parallel trends holds.

**On the logit scale**:
- Control: logit(0.40) - logit(0.30) = `r round(qlogis(0.40) - qlogis(0.30), 3)`
- Treated: logit(0.35) - logit(0.25) = `r round(qlogis(0.35) - qlogis(0.25), 3)`

These are **not equal**! A researcher testing parallel trends on the log-odds
scale would (correctly) reject — even though the underlying time trend is
identical for both groups on the probability scale.

```{r demo_scale}
# Demonstrate scale sensitivity
p_ctrl_pre  <- 0.30; p_ctrl_post <- 0.40
p_treat_pre <- 0.25; p_treat_post <- 0.35

cat("=== Probability Scale ===\n")
cat("Control change:  ", round(p_ctrl_post  - p_ctrl_pre, 4),  "\n")
cat("Treated change:  ", round(p_treat_post - p_treat_pre, 4), "\n")
cat("DiD (prob):      ", round((p_treat_post - p_treat_pre) - (p_ctrl_post - p_ctrl_pre), 4), "\n\n")

cat("=== Log-Odds (Logit) Scale ===\n")
cat("Control change:  ", round(qlogis(p_ctrl_post)  - qlogis(p_ctrl_pre), 4),  "\n")
cat("Treated change:  ", round(qlogis(p_treat_post) - qlogis(p_treat_pre), 4), "\n")
cat("DiD (logit):     ", round((qlogis(p_treat_post) - qlogis(p_treat_pre)) -
                               (qlogis(p_ctrl_post) - qlogis(p_ctrl_pre)), 4), "\n\n")

cat("=== Probit Scale ===\n")
cat("Control change:  ", round(qnorm(p_ctrl_post)  - qnorm(p_ctrl_pre), 4),  "\n")
cat("Treated change:  ", round(qnorm(p_treat_post) - qnorm(p_treat_pre), 4), "\n")
cat("DiD (probit):    ", round((qnorm(p_treat_post) - qnorm(p_treat_pre)) -
                               (qnorm(p_ctrl_post) - qnorm(p_ctrl_pre)), 4), "\n")
```

**Key takeaway**: The same underlying DGP yields *different* DiD estimates
and *different* pre-trends test results depending on the scale chosen. There
is no uniquely correct scale — the right scale is determined by the DGP.

---

# When Does Scale Choice Matter?

Scale sensitivity is most severe when:

1. **Baseline probabilities are far from 0.5** (where logit and probit curves
   are most nonlinear)
2. **Groups have different baseline rates** (asymmetric baseline)
3. **Treatment effects are large** (more curvature in the relevant region)

```{r severity}
# Show severity across baseline probability values
baseline_probs <- seq(0.05, 0.45, by = 0.05)
delta_p        <- 0.10  # same additive change for both groups

severity_df <- do.call(rbind, lapply(baseline_probs, function(p0) {
  p1 <- p0 + delta_p
  # Parallel in prob => same change
  # Logit DiD if treated has different baseline (p0 - 0.05)
  p0_treat <- max(p0 - 0.05, 0.02)
  p1_treat <- p0_treat + delta_p

  logit_did <- (qlogis(p1_treat) - qlogis(p0_treat)) -
               (qlogis(p1)       - qlogis(p0))

  data.frame(
    baseline_ctrl  = p0,
    baseline_treat = p0_treat,
    logit_did      = logit_did
  )
}))

cat("Logit-scale DiD when true probability DiD = 0:\n")
print(severity_df, digits = 3, row.names = FALSE)
cat("\nLarger deviations at low baseline probabilities.\n")
```

---

# Simulation: Linear vs. Logit DiD

We now simulate data where parallel trends holds on the **logit scale**
(the correct scale for the DGP), and compare estimates from:
1. Linear DiD (standard CS2021)
2. Logit DiD (NonlinearDiD)

```{r simulation}
# DGP: parallel trends on logit scale
dat <- sim_binary_panel(
  n            = 1000,
  nperiods     = 8,
  prop_treated = 0.5,
  n_cohorts    = 3,
  true_att     = c(0.20, 0.35, 0.25),
  base_prob    = 0.20,   # low baseline: nonlinearity matters most
  unit_fe_sd   = 0.5,
  seed         = 42
)

cat("Baseline outcome rate (untreated, pre-period):",
    round(mean(dat$y[dat$D == 0 & dat$period == 1]), 3), "\n")
cat("True ATTs (avg):", round(mean(c(0.20, 0.35, 0.25)), 3), "\n\n")
```

```{r fit_both}
# Logit DiD
res_logit <- nonlinear_attgt(
  dat, "y", "period", "id", "g",
  outcome_model = "logit",
  control_group = "nevertreated"
)

# Linear DiD
res_linear <- nonlinear_attgt(
  dat, "y", "period", "id", "g",
  outcome_model = "linear",
  control_group = "nevertreated"
)

# Aggregate
agg_logit  <- nonlinear_aggte(res_logit,  type = "dynamic")
agg_linear <- nonlinear_aggte(res_linear, type = "dynamic")

cat("=== Overall ATT ===\n")
cat("Linear DiD: ", round(agg_linear$overall_att, 4), "\n")
cat("Logit DiD:  ", round(agg_logit$overall_att,  4), "\n")
cat("True ATT:   ", round(mean(c(0.20, 0.35, 0.25)), 4), "\n")
```

---

# Pre-Trends Sensitivity

A critical practical insight: even if the **true DGP** has no pre-treatment
differences (i.e., the identifying assumption holds on some scale), a
pre-trends test on the wrong scale may falsely reject.

```{r pretrends_both}
# Test on logit scale
pt_logit <- nonlinear_pretest(res_logit, plot = FALSE)
cat("Pre-trends test (logit scale):\n")
cat("  Joint p-value:", round(pt_logit$joint_pval, 4), "\n\n")

# Test on linear scale
pt_linear <- nonlinear_pretest(res_linear, plot = FALSE)
cat("Pre-trends test (linear scale):\n")
cat("  Joint p-value:", round(pt_linear$joint_pval, 4), "\n\n")

cat("Note: If true DGP is logit-scale parallel trends, the linear-scale\n")
cat("pre-trends test may spuriously reject due to functional form.\n")
```

---

# Recommendations

Based on Roth & Sant'Anna (2023) and the evidence above:

1. **Think about your DGP first.** If your outcome is binary with moderate
   baseline rates (15–85%), use a nonlinear model.

2. **Report the scale of your parallel trends assumption.** "We assume
   parallel trends in log-odds" is a substantively different claim from
   "parallel trends in probabilities."

3. **Use doubly-robust estimation** (`doubly_robust = TRUE`) which is
   consistent under misspecification of either the outcome model or
   the propensity score model.

4. **Consider the odds-ratio DiD** for binary outcomes when you want an
   estimate that does not depend on the reference group or period.

5. **Use `nonlinear_bounds()`** to report the range of ATTs consistent
   with the data under minimal assumptions.

---

# References

Roth, J., & Sant'Anna, P. H. C. (2023). When is parallel trends sensitive
to functional form? *Econometrica*, 91(2), 737-747.

Wooldridge, J. M. (2023). Simple approaches to nonlinear
difference-in-differences with panel data. *The Econometrics Journal*, 26(3).