--- title: "Choosing a fusion structure: cohort vs. event-study penalties" author: "Gregory Faletto" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true math_method: "mathml" output_file: "FETWFE_Fusion_Structure_Vignette.html" vignette: > %\VignetteIndexEntry{Choosing a fusion structure: cohort vs. event-study penalties} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{ggplot2, dplyr} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r setup} library(fetwfe) ``` # What a "fusion structure" is FETWFE estimates a separate treatment effect for every treated cohort `g` in every post-treatment period `t` --- written `tau_{g, t}` --- and then shrinks those effects toward each other with a *fusion penalty*. "Fusion" means the penalty acts on the *differences* between selected pairs of treatment-effect coefficients: pushing a difference toward zero pools the two effects, so the estimator borrows strength across cells instead of estimating each `tau_{g, t}` in isolation. Which differences the penalty acts on is the *fusion structure*, and it encodes the kind of similarity you expect the treatment effects to have. Since version 1.19.0 `fetwfe()` exposes this choice through a single argument, `fusion_structure`, with two options: ```{r, eval = FALSE} fetwfe(..., fusion_structure = "cohort") # the default fetwfe(..., fusion_structure = "event_study") # the alternative added for issue #40 ``` Both penalties are *data-driven*, not hard restrictions. The estimator pools toward the chosen structure but is free to deviate from it when the data demand it. So choosing a fusion structure does **not** compromise the asymptotic validity of the estimator (a lemma in the paper's extensions section shows the event-study penalty satisfies the same conditions, with the same singular-value constants `sigma_max <= sqrt(6)` and `sigma_min >= 1 / (T * sqrt(2 * T))`, as the default; every consistency, restriction-selection, and inference guarantee in the paper therefore holds verbatim for it). The cost of guessing "wrong" is *finite-sample efficiency* --- noisier estimates --- not bias or invalid confidence intervals. ## `fusion_structure = "cohort"` (the default) The default penalty fuses **adjacent cohorts at the same calendar time** and **adjacent time periods within the same cohort**. It pools toward the assumption that treatment effects vary *smoothly* across neighboring cohorts and smoothly over calendar time within a cohort. This is a sensible, agnostic default: it does not privilege any single summary of the effects. ## `fusion_structure = "event_study"` The event-study penalty instead fuses, for each *event time* `e = 0, 1, 2, ...` (time since treatment), the effect `tau_{r, r + e}` across all cohorts `r`. It pools toward the assumption that treatment effects depend only on **time since treatment**, not on the calendar period or the cohort --- the identifying restriction behind the standard event-study aggregation used throughout applied difference-in-differences (e.g., `did::aggte(type = "dynamic")`). When the data really are driven by event time (gradual onset, a dose-response that accumulates with exposure), this pools strength across cohorts at each event time and estimates the dynamic path more efficiently. # Two meanings of "event study" --- read this first The word "event study" appears in `fetwfe` in **two distinct, independent places**, and conflating them is the most common source of confusion: - **`fusion_structure = "event_study"`** is an *estimation-time penalty*. It changes the **fit** --- the coefficients you get out of `fetwfe()`. - **`family = "event_study"`** in `eventStudy()` and `simultaneousCIs(fit, family = "event_study")` is a *reporting-time aggregation*. It changes how an already-fitted set of effects is **summarized** (one number per event time), and does nothing to the fit. These two choices are **orthogonal**: you can report an event-study aggregation of a *cohort*-penalized fit, or a cohort aggregation of an *event-study*-penalized fit. Every combination is valid: ```{r, eval = FALSE} # event-time SUMMARY of a default (cohort-penalized) fit: eventStudy(fetwfe(..., fusion_structure = "cohort")) # cohort SUMMARY of an event-study-penalized fit: simultaneousCIs(fetwfe(..., fusion_structure = "event_study"), family = "cohort") ``` The rest of this vignette uses `eventStudy()` (the reporting aggregation) to *display* fits made under both penalties --- a concrete illustration of the orthogonality. # A side-by-side demo We simulate a panel whose treatment effects are **sparse in the event-study basis** --- effects that share the same time since treatment (event time `e`) are fused across cohorts, so the true effect depends mostly on `e`. We generate this truth natively with `genCoefs(fusion_structure = "event_study")` (the simulation-side companion to the estimator argument), then fit the panel with both fusion structures and see which recovers the truth better. Because we know the data-generating effects exactly, we can measure recovery error directly. We use the package's simulation utilities (`genCoefs()` / `simulateData()` / `fetwfeWithSimulatedData()`; see the *Simulation* vignette for a full tour). The treatment-effect coefficients occupy a known block of the coefficient vector; `fetwfe()` reports their positions in its `treat_inds` slot (see `?fetwfe`), and those positions depend only on the panel shape (`G` cohorts, `T` periods, `d` covariates): ```{r} G <- 6L T <- 9L d <- 1L N <- 400L # Generate truth that is sparse in the event-study basis directly, with # fusion_structure = "event_study". seed 10 is a representative draw (see the # note after the plot); density 0.25 keeps the truth genuinely sparse in the # event-study basis. coefs <- genCoefs( G = G, T = T, d = d, density = 0.25, eff_size = 2, seed = 10, fusion_structure = "event_study" ) # Locate the treatment-effect block tau_{g, t} (cohort-major order). cohort_times <- getTes(coefs)$cohort_times # each cohort's first treated period n_k <- T - cohort_times + 1 # number of post-treatment effects per cohort num_treats <- sum(n_k) base_cols <- G + (T - 1) + d + d * G + d * (T - 1) # fixed effects + covariate terms # `fetwfe()` reports these positions as an integer vector, so build `treat_inds` # as an integer too (the `base_cols` arithmetic above is double-typed) -- this # keeps the `identical()` cross-check below exact rather than tripping on storage # mode. treat_inds <- as.integer(base_cols + seq_len(num_treats)) # the tau block # Event time (time since treatment) of each tau, in the same order. event_time <- unlist(lapply(n_k, function(nk) 0:(nk - 1))) # The true per-cell treatment effects, generated natively (not hand-built). true_te <- coefs$beta[treat_inds] sim <- simulateData(coefs, N = N, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 10) ``` Now fit the same simulated panel under each fusion structure: ```{r, message = FALSE, warning = FALSE} fit_cohort <- fetwfeWithSimulatedData(sim, lambda_selection = "bic") fit_event <- fetwfeWithSimulatedData( sim, lambda_selection = "bic", fusion_structure = "event_study" ) # The positions we wrote the truth to are exactly the ones fetwfe() reports as # the treatment effects, so we can compare estimates to the truth cell by cell. stopifnot(identical(treat_inds, fit_cohort$treat_inds)) ``` How well does each fit recover the true per-cell effects? Compare the estimated treatment-effect coefficients to the known truth (mean squared error over all `r length(true_te)` cells): ```{r} mse <- function(fit) mean((fit$beta_hat[treat_inds] - true_te)^2) errs <- c(cohort = mse(fit_cohort), event_study = mse(fit_event)) round(errs, 3) ``` ```{r, echo = FALSE} pct <- round(100 * (1 - errs[["event_study"]] / errs[["cohort"]])) ``` On this panel the event-study penalty cuts the per-cell recovery error by about `r pct`%. The advantage is exactly what we should expect: the truth *is* an event-study structure, so pooling effects across cohorts at each event time pools the right things, while the default penalty spends effort smoothing across cohorts and calendar time that the truth does not actually share. The same story is visible at the reporting level. Below we aggregate **both** fits to one effect per event time with `eventStudy()` (the reporting family --- applied here on top of both penalties) and overlay the true profile: ```{r, eval = requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE), fig.width = 7, fig.height = 4.5} library(ggplot2) library(dplyr) profiles <- dplyr::bind_rows( as.data.frame(eventStudy(fit_cohort)) |> dplyr::mutate(structure = "cohort (default)"), as.data.frame(eventStudy(fit_event)) |> dplyr::mutate(structure = "event_study") ) # True profile per event time: the mean of the natively generated tau at each # event time (the truth is event-study-sparse, not a clean formula line). truth <- aggregate( estimate ~ event_time, data = data.frame(event_time = event_time, estimate = true_te), FUN = mean ) ggplot(profiles, aes(event_time, estimate, color = structure, fill = structure)) + geom_ribbon(aes(ymin = ci_low, ymax = ci_high), alpha = 0.15, color = NA) + geom_line(linewidth = 0.8) + geom_point(size = 1.6) + geom_line( data = truth, aes(event_time, estimate), color = "black", linetype = "dashed", inherit.aes = FALSE ) + labs( title = "Recovering an event-time treatment-effect profile", subtitle = "dashed black = true event-time profile (mean generated tau by event time)", x = "event time (periods since treatment)", y = "treatment effect", color = "fusion_structure", fill = "fusion_structure" ) + theme_minimal() ``` The event-study-penalized profile tracks the dashed truth more closely; the default profile lags, because its calendar-and-cohort smoothing pulls effects toward neighbors the truth does not actually share. (The exact magnitude --- and, on any single draw, even the direction --- depends on the simulated data: across event-study-sparse draws at this configuration the event-study penalty wins on roughly four out of five seeds, and it is the per-cell coefficient recovery above, not the aggregated profile, that captures that most reliably. The seed shown here is one representative such draw.) # When to prefer each - **Prefer `fusion_structure = "event_study"`** when you believe treatment effects are driven mainly by **time since treatment** --- a gradual onset, a build-up or decay of the effect over exposure --- and you want to pool strength across cohorts at each event time. This matches the default null of the applied event-study literature, so the estimator's "shrink toward an event-study structure" behavior is a familiar story to report. - **Prefer `fusion_structure = "cohort"` (the default)** when treatment effects plausibly **differ by cohort** --- cohort-specific responsiveness, or calendar-time shocks that hit all cohorts in a given year --- and you want smoothness across neighboring cohorts and within-cohort time without privileging an event-time summary. It is the right agnostic choice when you do not have a strong prior that event time is the only thing that matters. Either way, the penalty is data-driven and the paper's guarantees hold for both, so the decision affects efficiency, not validity. If you are unsure, the default is a safe starting point; switch to `"event_study"` when the dynamic, time-since- treatment path is the object of interest and you expect it to be shared across cohorts. # Supplying your own fusion matrix `D_N` (advanced) The two built-in structures are special cases of a more general mechanism: under the hood, `fetwfe()` applies a **forward-differences matrix** `D_N` to the treatment-effect block, penalizes the differenced coordinates, and maps back via `solve(D_N)`. The `fusion_matrix` argument lets you supply your own `num_treats x num_treats` `D_N` directly --- encoding an arbitrary identifying-restriction structure (for example, "fuse cohorts that share an industry") that neither built-in expresses. When supplied, `fusion_matrix` **overrides** `fusion_structure` for the treatment-effect block (the fixed-effect blocks are unchanged). ## The row/column ordering `D_N` is indexed by position, so its rows and columns must follow the same **cohort-major `(g, t)` order** the estimator uses for the treatment effects. Row/column `i` corresponds to base treatment effect `i`: - `num_treats = T * G - G * (G + 1) / 2` is the total number of base treatment effects. - `getFirstInds(G, T)` returns, for each cohort `g`, the 1-based row of its **first** effect `tau_{g, 0}`. Cohort `g` then occupies rows `first_inds[g]:(first_inds[g + 1] - 1)` (the last cohort runs through `num_treats`), ordered by event time within the cohort (`tau_{g, 0}, tau_{g, 1}, ...`). - `getTreatInds(G, T, d, num_treats)` maps this block into the full design-matrix columns; `fit$treat_inds` reports the same positions on a fitted object. Interpreting `D_N` as forward differences: `theta = D_N %*% beta` is the penalized (sparse) parameterization and `beta = solve(D_N) %*% theta` recovers the original-scale effects. You supply `D_N`; the code uses `solve(D_N)`. ## Worked example We build `first_inds` from the per-cohort effect counts `n_k` (already computed above; no internal helper needed) so the matrix's cohort-major layout is explicit, then supply a custom `D_N` and refit. (This reuses the `sim` panel from the side-by-side demo above.) ```{r, message = FALSE, warning = FALSE} # first_inds in the same cohort-major order the matrix must follow: cohort 1 # starts at row 1, and each later cohort starts after the previous cohort's # block (of size n_k). first_inds <- cumsum(c(1, head(n_k, -1))) first_inds # the 1-based starting row of each cohort's first effect # A within-cohort backward-differences D_N: penalize each cohort's first effect # directly (toward zero) and every later effect toward its within-cohort # predecessor. (theta_i = beta_i - beta_{i-1} within a cohort; theta = beta at # each cohort's first effect.) This is one transparent, hand-built example; any # invertible num_treats x num_treats matrix in this row order is accepted. D_N <- diag(num_treats) for (k in seq_along(first_inds)) { start <- first_inds[k] end <- if (k < length(first_inds)) first_inds[k + 1] - 1 else num_treats if (end > start) { for (i in (start + 1):end) { D_N[i, i - 1] <- -1 } } } fit_custom <- fetwfeWithSimulatedData( sim, lambda_selection = "bic", fusion_matrix = D_N ) # A finite, well-formed fit; the custom block was used throughout. fit_custom$att_hat ``` A genuinely bespoke `D_N` --- one that fuses your own set of effects (for example, the event-study structure, or "fuse cohorts that share an industry") --- is built the same way: choose which differences to penalize, assemble the corresponding forward-differences matrix in this cohort-major order, and pass it as `fusion_matrix`. ## Staying inside the theoretical guarantees Under the paper's fixed-dimension scoping, the inferential guarantees (valid standard errors and confidence intervals) hold for **any** finite, invertible `D_N`: the theory depends on `D_N` only through its invertibility and the size of its singular values (Assumption (D) of Faletto 2025), and a fixed invertible matrix automatically has its singular values in a fixed positive range. Swapping one valid `D_N` for another changes only constant factors, not the guarantees themselves --- so you do not need to engineer your `D_N` to hit particular singular-value targets. Finite and invertible is enough. `fetwfe()` enforces exactly that, and nothing stricter. It **stops** when `fusion_matrix` is not a finite, invertible numeric matrix of the exact `num_treats x num_treats` dimension --- a malformed restriction matrix is a usage error, not a modeling choice. As a numerical courtesy it also emits a `warning()` (not an error) when the supplied `D_N` is *near-singular*: still invertible, so estimation proceeds, but its inverse may be numerically unreliable, which usually means a better-conditioned `D_N` would serve you better. ```{r, warning = TRUE} # A near-singular (but invertible) D_N: estimation proceeds, with a heads-up. near_singular <- diag(num_treats) near_singular[1, 1] <- 1e-12 fit_warned <- tryCatch( fetwfeWithSimulatedData(sim, lambda_selection = "bic", fusion_matrix = near_singular), warning = function(w) conditionMessage(w) ) # The warning message (estimation still succeeds if you allow the warning): cat(fit_warned) ``` # References - Faletto, G. (2025). *Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions.* arXiv preprint. - Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with multiple time periods. *Journal of Econometrics*, 225(2), 200--230.