| Title: | Fused Extended Two-Way Fixed Effects |
|---|---|
| Description: | Calculates the fused extended two-way fixed effects (FETWFE) estimator for unbiased and efficient estimation of difference-in-differences in panel data with staggered treatment adoption. This estimator eliminates bias inherent in conventional two-way fixed effects estimators, while also employing a novel bridge regression regularization approach to improve efficiency and yield valid standard errors. Also implements extended TWFE (etwfe) and bridge-penalized ETWFE (betwfe). Provides S3 classes for streamlined workflow and supports flexible tuning (ridge and rank-condition guarantees), automatic covariate centering/scaling, and detailed overall and cohort-specific effect estimates with valid standard errors. Includes simulation and formatting utilities, extensive diagnostic tools, vignettes, and examples. See Faletto (2025) (<doi:10.48550/arXiv.2312.05985>). |
| Authors: | Gregory Faletto [aut, cre] (ORCID: <https://orcid.org/0000-0001-8298-1401>) |
| Maintainer: | Gregory Faletto <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.27.2 |
| Built: | 2026-06-10 07:19:42 UTC |
| Source: | https://github.com/gregfaletto/fetwfepackage |
catt_df objectIntercepts column selection by the pre-1.11.0 Title-Case names and
stops with a migration message. The check fires when an old name
appears in the column-selector position: j for the df[i, j]
two-index form, or i for the df[j] one-index column-selection
form. Row-only access (df[i, ]) and access by new column names
fall through to the data.frame method via NextMethod().
## S3 method for class 'catt_df' x[i, j, ...]## S3 method for class 'catt_df' x[i, j, ...]
x |
A |
i, j
|
Row / column selectors; see |
... |
Further arguments passed through. |
The selected subset, as for [.data.frame.
catt_df objectIntercepts access by the pre-1.11.0 Title-Case column names
(Cohort, Estimated TE, SE, ConfIntLow, ConfIntHigh,
P_value) and stops with a migration message pointing to the new
snake_case name. All other access falls through to the
data.frame method via NextMethod().
## S3 method for class 'catt_df' x[[i, ...]]## S3 method for class 'catt_df' x[[i, ...]]
x |
A |
i |
Index; passed through to |
... |
Further arguments passed through. |
The column or value, as for [[.data.frame.
catt_df objectIntercepts assignment by the pre-1.11.0 Title-Case column names
(e.g., df[["Estimated TE"]] <- v) and stops with a migration
message pointing to the new snake_case name. This closes the gap
where a partial migration (RHS updated to new name, LHS still old)
would silently append a new column rather than overwriting the
renamed one. All other assignment falls through to the
data.frame method via NextMethod().
## S3 replacement method for class 'catt_df' x[[i, ...]] <- value## S3 replacement method for class 'catt_df' x[[i, ...]] <- value
x |
A |
i |
Index; passed through to |
... |
Further arguments passed through. |
value |
The value to assign. |
The modified catt_df object, as for [[<-.data.frame.
catt_df objectIntercepts column assignment by the pre-1.11.0 Title-Case names
and stops with a migration message. The check fires when an old
name appears in the column-selector position. Row-only assignment
(df[i, ] <- v) and assignment by new column names fall through
to the data.frame method via NextMethod().
## S3 replacement method for class 'catt_df' x[i, j, ...] <- value## S3 replacement method for class 'catt_df' x[i, j, ...] <- value
x |
A |
i, j
|
Row / column selectors; see |
... |
Further arguments passed through. |
value |
The value to assign. |
The nargs() distinction here mirrors the read-side [.catt_df,
but the threshold shifts by one because [<- carries an extra
positional value argument: df[i] <- v has nargs() == 3 and
i is the column selector; df[i, j] <- v and df[i, ] <- v have
nargs() == 4 and j (if non-missing) is the column selector.
The modified catt_df object, as for [<-.data.frame.
catt_df objectIntercepts access by the pre-1.11.0 Title-Case column names and
stops with a migration message. All other access falls through
to the data.frame method via NextMethod().
## S3 method for class 'catt_df' x$name## S3 method for class 'catt_df' x$name
x |
A |
name |
Character; the column name being accessed via |
The column, as for $.data.frame.
catt_df objectIntercepts assignment by the pre-1.11.0 Title-Case column names
(e.g., df$Cohort <- v) and stops with a migration message. All
other assignment falls through to the data.frame method via
NextMethod().
## S3 replacement method for class 'catt_df' x$name <- value## S3 replacement method for class 'catt_df' x$name <- value
x |
A |
name |
Character; the column name being assigned via |
value |
The value to assign. |
The modified catt_df object, as for $<-.data.frame.
att_gt() to a dataframe suitable for fetwfe() / etwfe()
attgtToFetwfeDf() reshapes and renames a panel dataset that is already
formatted for did::att_gt() (Callaway and Sant'Anna 2021) so that it can be
passed directly to fetwfe() or etwfe() from the fetwfe package. In
particular, it
creates an absorbing-state treatment dummy that equals 1 from the first treated period onward* and 0 otherwise,
(optionally) drops units that are already treated in the very first
period of the sample (because fetwfe() removes them internally), and
returns a tidy dataframe whose column names match the arguments that
fetwfe()/etwfe() expect.
attgtToFetwfeDf( data, yname, tname, idname, gname, covars = character(0), drop_first_period_treated = TRUE, out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment", response = "response"), verbose = FALSE )attgtToFetwfeDf( data, yname, tname, idname, gname, covars = character(0), drop_first_period_treated = TRUE, out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment", response = "response"), verbose = FALSE )
data |
A |
yname |
Character scalar. Name of the outcome column. |
tname |
Character scalar. Name of the time variable (numeric or
integer). This becomes |
idname |
Character scalar. Name of the unit identifier. Converted to
character and returned as |
gname |
Character scalar. Name of the group variable holding the first period of treatment. Values must be 0 for never-treated, or a positive integer representing the first treated period. |
covars |
Character vector of additional covariate column names to carry
through (default |
drop_first_period_treated |
Logical. If |
out_names |
A named list giving the column names to use in the
resulting dataframe. Defaults are |
verbose |
Logical. If |
A data.frame with columns time_var, unit_var, treatment,
response, and any covariates requested in covars, ready to be fed to
fetwfe()/etwfe(). All required columns are of the correct type:
time_var is integer, unit_var is character, treatment is integer
0/1, and response is numeric.
Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in- Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. doi:10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015.
## toy example --------------------------------------------------------------- ## Not run: library(did) # provides the mpdta example dataframe data(mpdta) head(mpdta) tidy_df <- attgtToFetwfeDf( data = mpdta, yname = "lemp", tname = "year", idname = "countyreal", gname = "first.treat", covars = c("lpop")) head(tidy_df) ## End(Not run) ## Now you can call fetwfe() ------------------------------------------------ # res <- fetwfe( # pdata = tidy_df, # time_var = "time_var", # unit_var = "unit_var", # treatment = "treatment", # response = "response", # covs = c("lpop"))## toy example --------------------------------------------------------------- ## Not run: library(did) # provides the mpdta example dataframe data(mpdta) head(mpdta) tidy_df <- attgtToFetwfeDf( data = mpdta, yname = "lemp", tname = "year", idname = "countyreal", gname = "first.treat", covars = c("lpop")) head(tidy_df) ## End(Not run) ## Now you can call fetwfe() ------------------------------------------------ # res <- fetwfe( # pdata = tidy_df, # time_var = "time_var", # unit_var = "unit_var", # treatment = "treatment", # response = "response", # covs = c("lpop"))
Same shape as augment.fetwfe(), dispatched on class "betwfe". data
is auto-sorted by (unit, time) and any first-period-treated units
are auto-trimmed; pass the same raw pdata you handed to betwfe().
## S3 method for class 'betwfe' augment(x, data, ...)## S3 method for class 'betwfe' augment(x, data, ...)
x |
An object of class |
data |
A panel |
... |
Unused. |
data with .fitted and .resid columns appended.
## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- betwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- betwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)
Same shape as augment.fetwfe(), dispatched on class "etwfe". data
is auto-sorted by (unit, time) and any first-period-treated units
are auto-trimmed; pass the same raw pdata you handed to etwfe().
## S3 method for class 'etwfe' augment(x, data, ...)## S3 method for class 'etwfe' augment(x, data, ...)
x |
An object of class |
data |
A panel |
... |
Unused. |
data with .fitted and .resid columns appended.
## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- etwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- etwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)
Computes .fitted = X %*% beta_hat + x$y_mean and
.resid = data[[x$response_col_name]] - .fitted, then column-binds those
two columns onto data. The response mean and column name are stored on
the fitted object during fitting (the estimator internally centers y
before solving), so fitted values come back on the original-response
scale without the caller having to remember either.
## S3 method for class 'fetwfe' augment(x, data, ...)## S3 method for class 'fetwfe' augment(x, data, ...)
x |
An object of class |
data |
A panel |
... |
Unused. |
data is auto-handled to match the fitted design: rows are auto-sorted
by (unit, time), and any first-period-treated units (whose treatment
effect cannot be identified by the estimator) are auto-trimmed via
idCohorts(). So you can pass the same raw pdata you handed to
fetwfe() — the method takes care of alignment. The only hard
requirement is that data contains the response column under its
original name.
A copy of data with two extra numeric columns: .fitted
and .resid.
## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)## Not run: sim <- simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(sim) broom::augment(res, data = sim$pdata) ## End(Not run)
augment() is intentionally not provided for twfeCovs() objects (#58):
the fit's coefficient vector lives in a reduced cohort-level basis that the
shared fitted-value path (X %*% beta_hat) does not match, so a meaningful
.fitted / .resid cannot be reconstructed. Use tidy.twfeCovs(),
glance.twfeCovs(), or summary() instead. Calling this method always
raises an error.
## S3 method for class 'twfeCovs' augment(x, data, ...)## S3 method for class 'twfeCovs' augment(x, data, ...)
x |
An object of class |
data |
Ignored. |
... |
Ignored. |
(none; raises an error).
Implementation of extended two-way fixed effects with a bridge penalty. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
betwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise") )betwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise") )
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Either a character vector containing the names of
the columns for covariates (e.g., |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID
noise assumed to apply to each observation. See Section 2 of Faletto (2025)
for details. It is best to provide this variance if it is known (for example,
if you are using simulated data). If this variance is unknown, this argument
can be omitted, and the variance will be estimated by
REML on the linear mixed-effects model |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID
noise (random effects) assumed to apply to each observation. See Section 2 of
Faletto (2025) for details. It is best to provide this variance if it is
known (for example, if you are using simulated data). If this variance is
unknown, this argument can be omitted, and the variance will be estimated
by REML via |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
lambda_selection |
Character; method for selecting the bridge
penalty parameter |
cv_folds |
Integer; number of folds for the CV path. Ignored when
|
cv_seed |
Integer or |
ci_type |
Character; one of |
An object of class betwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
att_selected |
Logical scalar; |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The number of selected features (excluding the
always-present intercept) at |
lambda.min |
Either the provided |
lambda.min_model_size |
The
number of selected features (excluding the always-present intercept) at
|
lambda_star |
The value of |
lambda_star_model_size |
The number of selected features (excluding the
always-present intercept) in the chosen model. If this value is close to
|
lambda_selection |
Character scalar; either |
cv_folds |
Integer scalar; the |
cv_seed |
Integer scalar; the seed actually fed to |
ci_type |
Character scalar; the |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
y_mean |
Numeric scalar; mean of the original (pre-centering) response.
Stored so downstream methods ( |
response_col_name |
Character scalar; the response column name in
the original |
time_var, unit_var, treatment
|
Character scalars; the corresponding
arguments the user passed. Consumed by |
covs |
Character vector; the original |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
Gregory Faletto
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545-554.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
# `bacondecomp` (which supplies the `castle` data) is a Suggests-only # dependency, so guard the example on its availability. if (requireNamespace("bacondecomp", quietly = TRUE)) { library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. No `covs`: castle's smallest # adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # On this panel betwfe's bridge penalty selects every cohort out, so the # estimated ATT and cohort effects below are all zero. res <- betwfe( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Average treatment effect on the treated units (in percentage point # units) 100 * res$att_hat # Conservative 95% confidence interval for ATT (in percentage point units) low_att <- 100 * (res$att_hat - qnorm(1 - 0.05 / 2) * res$att_se) high_att <- 100 * (res$att_hat + qnorm(1 - 0.05 / 2) * res$att_se) c(low_att, high_att) # Cohort average treatment effects and confidence intervals (in percentage # point units) catt_df_pct <- res$catt_df catt_df_pct[["estimate"]] <- 100 * catt_df_pct[["estimate"]] catt_df_pct[["se"]] <- 100 * catt_df_pct[["se"]] catt_df_pct[["ci_low"]] <- 100 * catt_df_pct[["ci_low"]] catt_df_pct[["ci_high"]] <- 100 * catt_df_pct[["ci_high"]] catt_df_pct }# `bacondecomp` (which supplies the `castle` data) is a Suggests-only # dependency, so guard the example on its availability. if (requireNamespace("bacondecomp", quietly = TRUE)) { library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. No `covs`: castle's smallest # adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # On this panel betwfe's bridge penalty selects every cohort out, so the # estimated ATT and cohort effects below are all zero. res <- betwfe( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Average treatment effect on the treated units (in percentage point # units) 100 * res$att_hat # Conservative 95% confidence interval for ATT (in percentage point units) low_att <- 100 * (res$att_hat - qnorm(1 - 0.05 / 2) * res$att_se) high_att <- 100 * (res$att_hat + qnorm(1 - 0.05 / 2) * res$att_se) c(low_att, high_att) # Cohort average treatment effects and confidence intervals (in percentage # point units) catt_df_pct <- res$catt_df catt_df_pct[["estimate"]] <- 100 * catt_df_pct[["estimate"]] catt_df_pct[["se"]] <- 100 * catt_df_pct[["se"]] catt_df_pct[["ci_low"]] <- 100 * catt_df_pct[["ci_low"]] catt_df_pct[["ci_high"]] <- 100 * catt_df_pct[["ci_high"]] catt_df_pct }
S3 class for the output of betwfe().
This function runs the bridge-penalized extended two-way fixed effects estimator (betwfe()) on
simulated data. It is simply a wrapper for betwfe(): it accepts an object of class
"FETWFE_simulated" (produced by simulateData()) and unpacks the necessary
components to pass to betwfe(). So the outputs match betwfe(), and the needed inputs
match their counterparts in betwfe().
betwfeWithSimulatedData( simulated_obj, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise") )betwfeWithSimulatedData( simulated_obj, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise") )
simulated_obj |
An object of class |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
lambda_selection |
Character; method for selecting the bridge
penalty parameter |
cv_folds |
Integer; number of folds for the CV path. Ignored when
|
cv_seed |
Integer or |
ci_type |
Character; one of |
An object of class betwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
att_selected |
Logical scalar; |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The number of selected features (excluding the
always-present intercept) at |
lambda.min |
Either the provided |
lambda.min_model_size |
The
number of selected features (excluding the always-present intercept) at
|
lambda_star |
The value of |
lambda_star_model_size |
The number of selected features (excluding the
always-present intercept) in the chosen model. If this value is close to
|
lambda_selection |
Character scalar; either |
cv_folds |
Integer scalar; the |
cv_seed |
Integer scalar; the seed actually fed to |
ci_type |
Character scalar; the |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
y_mean |
Numeric scalar; mean of the original (pre-centering) response.
Stored so downstream methods ( |
response_col_name |
Character scalar; the response column name in
the original |
time_var, unit_var, treatment
|
Character scalars; the corresponding arguments the user passed. |
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- betwfeWithSimulatedData(sim_data) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- betwfeWithSimulatedData(sim_data) ## End(Not run)
Extracts per-cohort ATT estimates from a fitted FETWFE / ETWFE / BETWFE /
twfeCovs object as a tidy data frame. Parallel to eventStudy() for
event-time aggregation: the per-cohort information is already available
via result$catt_df, and cohortStudy() surfaces it through a
discoverable function with its own help page (so users can reach it via
?cohortStudy without having to know the slot name).
The function is a pass-through on result$catt_df modulo class: the
columns, their values, and their order are unchanged. The returned
object carries class c("cohortStudy", "catt_df", "data.frame"). The
cohortStudy class dispatches the broom tidier tidy.cohortStudy();
the catt_df class preserves the helpful-error layer (introduced in
fetwfe 1.11.0) that intercepts pre-1.11.0 Title-Case column names
(Cohort, Estimated TE, etc.) with a migration message; the
data.frame base preserves the standard data-frame methods (print,
head, nrow, dplyr::filter, etc.).
cohortStudy(result)cohortStudy(result)
result |
A fitted object from |
A data frame with class c("cohortStudy", "catt_df", "data.frame")
containing one row per treated cohort and columns:
Character; the cohort label (the calendar time at which the cohort first received treatment).
Numeric; the per-cohort ATT estimate.
Numeric; standard error for the per-cohort ATT (NA when
the Gram matrix is singular or, for fetwfe() / betwfe(), the
bridge penalty zeroed out the cohort).
Numeric; the stored lower and upper
confidence-interval bounds, reflecting the fit's ci_type –
simultaneous (family-wise) by default, or pointwise 1 - alpha
Wald bounds when the fit used ci_type = "pointwise" (alpha is
the value passed at fit time).
Numeric; follows the fit's ci_type. Under
"pointwise", the two-sided Wald p-value
(2 * pnorm(-|estimate / se|)); under "simultaneous" (the
default), the single-step max-T multiplicity-adjusted (family-wise)
p-value matching the simultaneous band (#200). NA when se is 0
or NA.
(fetwfe() / betwfe() only.) Logical; TRUE when
the bridge penalty left the cohort's ATT nonzero. Absent for
etwfe() and twfeCovs(), which do not perform selection.
Use tidy(cohortStudy(result)) (with the broom package loaded) to
reshape to broom convention (term, estimate, std.error,
statistic, p.value, conf.low, conf.high, optionally
selected); see tidy.cohortStudy().
eventStudy() for the parallel event-time accessor;
cohortTimeATTs() for the fully disaggregated per-(cohort, time) accessor;
tidy.cohortStudy() for broom-shape translation.
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) cs <- cohortStudy(res) cs # Broom-shape translation: if (requireNamespace("broom", quietly = TRUE)) { broom::tidy(cs) } ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) cs <- cohortStudy(res) cs # Broom-shape translation: if (requireNamespace("broom", quietly = TRUE)) { broom::tidy(cs) } ## End(Not run)
Extracts the fully disaggregated treatment-effect estimates from a fitted
FETWFE / ETWFE / BETWFE object: one row for every (cohort, time) cell, with
no averaging over cohorts or over event time. This is the finest-grained view
of the estimated effects — the num_treats underlying parameters
themselves — complementing cohortStudy() (which averages each cohort's
cells over time) and eventStudy() (which averages over cohorts at each
event time).
Like eventStudy(), this accessor is not available for twfeCovs() objects:
that estimator has a single treatment-effect parameter per cohort (no
per-time disaggregation), so its finest granularity is already
cohortStudy().
Standard errors are the per-cell regression standard errors
, recomputed at
call time from the fit's stored design (the same Gram-matrix machinery
eventStudy() uses; nothing is added to the fitted object). Because each
cell is a single cohort-time parameter, the cohort-probability sampling
variance that contributes to the aggregated cohortStudy() /
eventStudy() standard errors is identically zero here, so a cell's SE is a
single coefficient's regression SE.
Confidence intervals and p-values are pointwise Wald
quantities (estimate +/- z * se). For simultaneous (family-wise) bands
over the cell family, use
simultaneousCIs(result, family = "all_post_treatment").
cohortTimeATTs(result, alpha = NULL)cohortTimeATTs(result, alpha = NULL)
result |
A fitted object from |
alpha |
Numeric in |
The cell standard error is computed from psi, the cell's row of the
(selected) treatment-effect design — for fetwfe() the relevant row of the
inverse fusion transform in the transformed (theta) coordinate
space, for betwfe() / etwfe() a unit selector in the original (beta)
coordinate space restricted to the selected support. It is
never gated on the point estimate: a cell whose estimate is exactly zero
because the penalty fused it away has an all-zero psi and therefore
se = 0 (the correct degenerate value), while a cell whose estimate happens
to be near zero for other reasons still receives its proper nonzero SE.
A data frame with class c("cohortTimeATTs", "data.frame")
containing one row per (cohort, time) treatment-effect cell, sorted by
cohort then time, with columns:
Character; the cohort label (the calendar time at which the
cohort first received treatment), matching cohortStudy().
Numeric; the calendar time of the cell, equal to the cohort's
adoption time plus the event time (0, 1, ...). Real panels carry their
actual calendar times. For synthetic genCoefs() / simulateData()
fixtures (whose panel runs 1, ..., T, so the stored first year is 1)
this coincides with the 1-based panel-time index. (Only a hand-built or
legacy fit with no stored first year falls back to that panel-time index
directly.)
Numeric; the cell's ATT estimate.
Numeric; the pointwise standard error. 0 for a cell zeroed
out by the fusion/bridge penalty while other cells survive
(fetwfe() / betwfe()); NA when standard errors are unavailable —
the fit was computed with q >= 1, the Gram matrix on the selected
support is singular, or the penalty zeroed the entire treatment block
(no cells selected, so there is no support to recompute the Gram from;
this matches eventStudy()).
Numeric; the pointwise 1 - alpha Wald bounds
estimate -/+ qnorm(1 - alpha/2) * se. (0, 0) for a fused-away cell;
NA when se is NA.
Numeric; the two-sided pointwise Wald p-value
2 * pnorm(-|estimate / se|). NA when se is 0 or NA.
(fetwfe() / betwfe() only.) Logical; TRUE when the
bridge penalty left the cell's estimate nonzero. Absent for etwfe(),
which does not perform selection.
Use tidy(cohortTimeATTs(result)) (with the broom package loaded) to
reshape to broom convention; see tidy.cohortTimeATTs().
cohortStudy() for the per-cohort (time-averaged) accessor;
eventStudy() for the per-event-time (cohort-averaged) accessor;
simultaneousCIs() for simultaneous (family-wise) bands over the cell
family (family = "all_post_treatment"); tidy.cohortTimeATTs() for
broom-shape translation.
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) cta <- cohortTimeATTs(res) cta # Broom-shape translation: if (requireNamespace("broom", quietly = TRUE)) { broom::tidy(cta) } ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) cta <- cohortTimeATTs(res) cta # Broom-shape translation: if (requireNamespace("broom", quietly = TRUE)) { broom::tidy(cta) } ## End(Not run)
Implementation of extended two-way fixed effects. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
etwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )etwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Either a character vector containing the names of
the columns for covariates (e.g., |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID
noise assumed to apply to each observation. See Section 2 of Faletto (2025)
for details. It is best to provide this variance if it is known (for example,
if you are using simulated data). If this variance is unknown, this argument
can be omitted, and the variance will be estimated by
REML on the linear mixed-effects model |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID
noise (random effects) assumed to apply to each observation. See Section 2 of
Faletto (2025) for details. It is best to provide this variance if it is
known (for example, if you are using simulated data). If this variance is
unknown, this argument can be omitted, and the variance will be estimated
by REML via |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
ci_type |
Character; one of |
An object of class etwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
ci_type |
Character scalar; the |
y_mean |
Numeric scalar; the mean of the original (pre-centering)
response. Stored so downstream methods ( |
response_col_name |
Character scalar; the name of the response
column in the original |
time_var, unit_var, treatment
|
Character scalars; the
|
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
Gregory Faletto
Wooldridge, J. M. (2021). Two-way fixed effects, the two-way mundlak regression, and difference-in-differences estimators. Available at SSRN 3906345. doi:10.2139/ssrn.3906345.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545-554.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
## Not run: library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # No `covs` here: etwfe is pure OLS (no bridge penalty), and castle's # smallest adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. res <- etwfe( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Print results print(res, max_cohorts = Inf) ## End(Not run)## Not run: library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # No `covs` here: etwfe is pure OLS (no bridge penalty), and castle's # smallest adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. res <- etwfe( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Print results print(res, max_cohorts = Inf) ## End(Not run)
etwfe::etwfe() to the format required by
fetwfe() and fetwfe::etwfe()
etwfeToFetwfeDf() reshapes and renames a panel dataset that is already
formatted for etwfe::etwfe() (McDermott 2024) so that it can be
passed directly to fetwfe() or etwfe() from the fetwfe package. In
particular, it
creates an absorbing-state treatment dummy that equals 1 from the first treated period onward* and 0 otherwise,
(optionally) drops units that are already treated in the very first
period of the sample (because fetwfe() removes them internally), and
returns a tidy dataframe whose column names match the arguments that
fetwfe()/etwfe() expect.
etwfeToFetwfeDf( data, yvar, tvar, idvar, gvar, covars = character(0), drop_first_period_treated = TRUE, out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment", response = "response"), verbose = FALSE )etwfeToFetwfeDf( data, yvar, tvar, idvar, gvar, covars = character(0), drop_first_period_treated = TRUE, out_names = list(time = "time_var", unit = "unit_var", treatment = "treatment", response = "response"), verbose = FALSE )
data |
A long-format data.frame that you could already feed to |
yvar |
Character. Column name of the outcome (left-hand side in your |
tvar |
Character. Column name of the time variable that you pass to |
idvar |
Character. Column name of the unit identifier (the variable you would
cluster on, or pass to |
gvar |
Character. Column name of the "first treated" cohort variable passed to |
covars |
Character vector of additional covariate columns to keep (default |
drop_first_period_treated |
Logical. Should units already treated in the very first
sample period be removed? ( |
out_names |
Named list giving the column names that the returned dataframe should have.
The default ( |
verbose |
Logical. If |
A tidy data.frame with (in this order)
time_var integer,
unit_var character,
treatment integer 0/1 absorbing-state dummy,
response numeric outcome,
any covariates requested in covars.
Ready to pass straight to fetwfe() or fetwfe::etwfe().
McDermott G (2024). etwfe: Extended Two-Way Fixed Effects. doi:10.32614/CRAN.package.etwfe doi:10.32614/CRAN.package.etwfe, R package version 0.5.0, https://CRAN.R-project.org/package=etwfe.
## toy example --------------------------------------------------------------- ## Not run: library(did) # provides the mpdta example dataframe data(mpdta) head(mpdta) tidy_df <- etwfeToFetwfeDf( data = mpdta, yvar = "lemp", tvar = "year", idvar = "countyreal", gvar = "first.treat", covars = c("lpop")) head(tidy_df) ## End(Not run) ## Now you can call fetwfe() ------------------------------------------------ # res <- fetwfe( # pdata = tidy_df, # time_var = "time_var", # unit_var = "unit_var", # treatment = "treatment", # response = "response", # covs = c("lpop"))## toy example --------------------------------------------------------------- ## Not run: library(did) # provides the mpdta example dataframe data(mpdta) head(mpdta) tidy_df <- etwfeToFetwfeDf( data = mpdta, yvar = "lemp", tvar = "year", idvar = "countyreal", gvar = "first.treat", covars = c("lpop")) head(tidy_df) ## End(Not run) ## Now you can call fetwfe() ------------------------------------------------ # res <- fetwfe( # pdata = tidy_df, # time_var = "time_var", # unit_var = "unit_var", # treatment = "treatment", # response = "response", # covs = c("lpop"))
This function runs the extended two-way fixed effects estimator (etwfe()) on
simulated data. It is simply a wrapper for etwfe(): it accepts an object of class
"FETWFE_simulated" (produced by simulateData()) and unpacks the necessary
components to pass to etwfe(). So the outputs match etwfe(), and the needed inputs
match their counterparts in etwfe().
etwfeWithSimulatedData( simulated_obj, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )etwfeWithSimulatedData( simulated_obj, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )
simulated_obj |
An object of class |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
ci_type |
Character; one of |
An object of class etwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
ci_type |
Character scalar; the |
y_mean |
Numeric scalar; the mean of the original (pre-centering)
response. Stored so downstream methods ( |
response_col_name |
Character scalar; the name of the response
column in the original |
time_var, unit_var, treatment
|
Character scalars; the
|
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- etwfeWithSimulatedData(sim_data) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- etwfeWithSimulatedData(sim_data) ## End(Not run)
For a fitted object from fetwfe(), etwfe(), or betwfe(), computes the
pooled event-time treatment-effect estimates tau_E(e), defined as
cohort-weighted averages of the cell-level treatment-effect estimates at
each post-treatment event time e = t - g (where t is calendar time and
g is the cohort's first-treated calendar time). Weights are
sample-cohort-size weights (matching did::aggte(type = "dynamic")
convention).
Standard errors combine two terms, mirroring the package's existing
overall-ATT SE machinery: var_1(e) from regression-coefficient noise
(computed via the same gram_inv machinery the package uses for cohort
SEs, or the cluster-robust sandwich under se_type = "cluster"), and
var_2(e) from cohort-probability noise (analog of the existing
getSecondVarTermOLS / getSecondVarTermDataApp machinery, with the
multinomial Jacobian restricted to cohorts valid at event time e).
Combined as sqrt(var_1 + var_2) by default (asymptotically exact
under paper Theorem te.asym.norm.thm() / Assumption (-IF), which
the package's default cohort-sample-proportions estimator satisfies);
the conservative Cauchy-Schwarz bound sqrt(var_1 + var_2 + 2 sqrt(var_1 * var_2)) is available via se_type = "conservative"
(for users with non-(-IF) propensity-score estimators). When
indep_counts was supplied at fit time, the tight formula applies
regardless of se_type (two-sample regime, Theorem (b)).
eventStudy(x, alpha = NULL, ci_type = NULL)eventStudy(x, alpha = NULL, ci_type = NULL)
x |
A fitted object of class |
alpha |
(Optional) Significance level for confidence intervals.
Defaults to |
ci_type |
(Optional) Character; one of |
A data frame with class c("eventStudy", "data.frame") and
columns:
Integer; event time e = t - g, ranging from 0
to T - 2.
Integer; number of cohorts contributing to the
pooled estimate at event time e.
Numeric; the pooled event-time ATT estimate.
Numeric; combined standard error.
Numeric; lower bound of the (1 - alpha) Wald CI.
Numeric; upper bound of the (1 - alpha) Wald CI.
Numeric; follows the fit's ci_type. Under
"pointwise", the two-sided Wald p-value
(2 * pnorm(-|estimate / se|)); under "simultaneous" (the
default), the single-step max-T multiplicity-adjusted (family-wise)
p-value matching the simultaneous band (#200). NA when se is 0
or NA.
Only post-treatment event times (e >= 0) are included; pre-treatment
placebo periods would require an extended regression specification and
are out of scope for this initial release.
cohortStudy() for the parallel per-cohort accessor;
cohortTimeATTs() for the fully disaggregated per-(cohort, time) accessor.
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) eventStudy(res) ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) eventStudy(res) ## End(Not run)
Implementation of fused extended two-way fixed effects. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
The treatment-effect fusion penalty defaults to a within-/between-cohort
geometry (fusion_structure = "cohort") and also supports an event-study
geometry (fusion_structure = "event_study", fusing effects at the same
time since treatment across cohorts) or a fully custom fusion_matrix. See
the fusion_structure / fusion_matrix arguments below and
vignette("fusion_structure_vignette", package = "fetwfe") for guidance
on choosing.
fetwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise"), fusion_structure = c("cohort", "event_study"), fusion_matrix = NULL )fetwfe( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise"), fusion_structure = c("cohort", "event_study"), fusion_matrix = NULL )
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Either a character vector containing the names of
the columns for covariates (e.g., |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID
noise assumed to apply to each observation. See Section 2 of Faletto (2025)
for details. It is best to provide this variance if it is known (for example,
if you are using simulated data). If this variance is unknown, this argument
can be omitted, and the variance will be estimated by
REML on the linear mixed-effects model |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID
noise (random effects) assumed to apply to each observation. See Section 2 of
Faletto (2025) for details. It is best to provide this variance if it is
known (for example, if you are using simulated data). If this variance is
unknown, this argument can be omitted, and the variance will be estimated
by REML via |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
lambda_selection |
Character; method for selecting the bridge
penalty parameter |
cv_folds |
Integer; number of folds for the CV path. Ignored when
|
cv_seed |
Integer or |
ci_type |
Character; one of |
fusion_structure |
Character; one of |
fusion_matrix |
(Optional.) Numeric matrix or |
An object of class fetwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
att_p_value |
A two-sided p-value for the overall ATT against the null |
att_selected |
Logical scalar; |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The number of selected features (excluding the always-present intercept) at |
lambda.min |
Either the provided |
lambda.min_model_size |
The number of selected features (excluding the always-present intercept) at |
lambda_star |
The value of |
lambda_star_model_size |
The number of selected features (excluding the always-present intercept) in the chosen model. If this value is close to |
lambda_selection |
Character scalar; either |
cv_folds |
Integer scalar; the |
cv_seed |
Integer scalar; the seed actually fed to |
fusion_structure |
Character scalar; the |
fusion_matrix |
The user-supplied custom forward differences matrix |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. Same as |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
ci_type |
Character scalar; the |
y_mean |
Numeric scalar; the mean of the original (pre-centering)
response. Stored so downstream methods ( |
response_col_name |
Character scalar; the name of the response
column in the original |
time_var, unit_var, treatment
|
Character scalars; the
|
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically not needed for interpretation:
|
The object has methods for print(), summary(), and coef(). By default, print() and summary() only show the essential outputs. To see internal details, use print(x, show_internal = TRUE) or summary(x, show_internal = TRUE). The coef() method returns the vector of estimated coefficients (beta_hat).
Gregory Faletto
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545-554.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
vignette("fusion_structure_vignette", package = "fetwfe") for
guidance on choosing between the cohort (default) and event-study fusion
penalties and on supplying a custom fusion_matrix.
# `bacondecomp` (which supplies the `divorce` data) is a Suggests-only # dependency, so guard the example on its availability. The fit is wrapped in # \donttest{} because it is slower than a toy example. if (requireNamespace("bacondecomp", quietly = TRUE)) { library(bacondecomp) data(divorce) # Stevenson & Wolfers (2006): the effect of unilateral ("no-fault") divorce # reforms on female suicide rates. Restrict to the female subset # (`sex == 2`); `changed` is already the absorbing 0/1 reform indicator, and # the elasticity-scaled female suicide rate is the response. divorce_f <- divorce[divorce$sex == 2, ] # Reproduces the empirical application in Faletto (2025, Sec. 8.2). The 9 # states already treated by 1964 are auto-dropped as first-period-treated, # and `murderrate` is auto-dropped (missing in 1964 for one state); both are # reported as (expected) warnings. The noise variances are supplied # (precomputed by REML) to keep the example fast and reproducible; the # default lambda_selection is "cv" (10-fold cross-validation). res <- fetwfe( pdata = divorce_f, time_var = "year", unit_var = "st", treatment = "changed", covs = c("murderrate", "lnpersinc", "afdcrolls"), response = "suiciderate_elast_jag", sig_eps_sq = 0.0344, sig_eps_c_sq = 0.1507, add_ridge = TRUE, q = 0.5) # FETWFE estimates an overall ATT of roughly -6% on the elasticity-scaled # female suicide rate, with a 95% confidence interval that excludes zero. # The selection step retains heterogeneous cohort effects (several cohorts # are pruned to exactly zero), rather than fusing to a single common effect. print(res, max_cohorts = Inf) }# `bacondecomp` (which supplies the `divorce` data) is a Suggests-only # dependency, so guard the example on its availability. The fit is wrapped in # \donttest{} because it is slower than a toy example. if (requireNamespace("bacondecomp", quietly = TRUE)) { library(bacondecomp) data(divorce) # Stevenson & Wolfers (2006): the effect of unilateral ("no-fault") divorce # reforms on female suicide rates. Restrict to the female subset # (`sex == 2`); `changed` is already the absorbing 0/1 reform indicator, and # the elasticity-scaled female suicide rate is the response. divorce_f <- divorce[divorce$sex == 2, ] # Reproduces the empirical application in Faletto (2025, Sec. 8.2). The 9 # states already treated by 1964 are auto-dropped as first-period-treated, # and `murderrate` is auto-dropped (missing in 1964 for one state); both are # reported as (expected) warnings. The noise variances are supplied # (precomputed by REML) to keep the example fast and reproducible; the # default lambda_selection is "cv" (10-fold cross-validation). res <- fetwfe( pdata = divorce_f, time_var = "year", unit_var = "st", treatment = "changed", covs = c("murderrate", "lnpersinc", "afdcrolls"), response = "suiciderate_elast_jag", sig_eps_sq = 0.0344, sig_eps_c_sq = 0.1507, add_ridge = TRUE, q = 0.5) # FETWFE estimates an overall ATT of roughly -6% on the elasticity-scaled # female suicide rate, with a 95% confidence interval that excludes zero. # The selection step retains heterogeneous cohort effects (several cohorts # are pruned to exactly zero), rather than fusing to a single common effect. print(res, max_cohorts = Inf) }
S3 class for objects returned by genCoefs().
Compact print method summarizes the coefficient vector and its
sparsity pattern instead of dumping the full beta and
theta vectors.
S3 class for objects returned by simulateData().
Compact print method summarizes the panel's dimensions and cohort
structure instead of dumping the full N*T x p design matrix
(which the default print.list would do).
This function runs the fused extended two-way fixed effects estimator (fetwfe()) on
simulated data. It is simply a wrapper for fetwfe(): it accepts an object of class
"FETWFE_simulated" (produced by simulateData()) and unpacks the necessary
components to pass to fetwfe(). So the outputs match fetwfe(), and the needed inputs
match their counterparts in fetwfe().
fetwfeWithSimulatedData( simulated_obj, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise"), fusion_structure = c("cohort", "event_study"), fusion_matrix = NULL )fetwfeWithSimulatedData( simulated_obj, lambda.max = NA, lambda.min = NA, nlambda = 100, q = 0.5, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", lambda_selection = "cv", cv_folds = 10L, cv_seed = NULL, ci_type = c("simultaneous", "pointwise"), fusion_structure = c("cohort", "event_study"), fusion_matrix = NULL )
simulated_obj |
An object of class |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
lambda_selection |
Character; method for selecting the bridge
penalty parameter |
cv_folds |
Integer; number of folds for the CV path. Ignored when
|
cv_seed |
Integer or |
ci_type |
Character; one of |
fusion_structure |
Character; one of |
fusion_matrix |
(Optional.) Numeric matrix or |
An object of class fetwfe containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
att_p_value |
A two-sided p-value for the overall ATT against the null |
att_selected |
Logical scalar; |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The number of selected features (excluding the always-present intercept) at |
lambda.min |
Either the provided |
lambda.min_model_size |
The number of selected features (excluding the always-present intercept) at |
lambda_star |
The value of |
lambda_star_model_size |
The number of selected features (excluding the always-present intercept) in the chosen model. If this value is close to |
lambda_selection |
Character scalar; either |
cv_folds |
Integer scalar; the |
cv_seed |
Integer scalar; the seed actually fed to |
fusion_structure |
Character scalar; the |
fusion_matrix |
The user-supplied custom forward differences matrix |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
alpha |
The alpha level used for confidence intervals. |
calc_ses |
Logical indicating whether standard errors were calculated. Same as |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
ci_type |
Character scalar; the |
y_mean |
Numeric scalar; the mean of the original (pre-centering)
response. Stored so downstream methods ( |
response_col_name |
Character scalar; the name of the response
column in the original |
time_var, unit_var, treatment
|
Character scalars; the
|
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically not needed for interpretation:
|
The object has methods for print(), summary(), and coef(). By default, print() and summary() only show the essential outputs. To see internal details, use print(x, show_internal = TRUE) or summary(x, show_internal = TRUE). The coef() method returns the vector of estimated coefficients (beta_hat).
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- fetwfeWithSimulatedData(sim_data) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- fetwfeWithSimulatedData(sim_data) ## End(Not run)
This function generates a coefficient vector beta for simulation studies of the fused
extended two-way fixed effects estimator. It returns an S3 object of class
"FETWFE_coefs" containing beta along with simulation parameters G,
T, and d. See the simulation studies section of Faletto (2025) for details.
genCoefs( G = NULL, T, d, density, eff_size, fusion_structure = c("cohort", "event_study"), assignment_type = c("marginal", "multinomial", "ordered"), assignment_strength = 1, assignment_interactions = NULL, assignment_interaction_strength = NULL, seed = NULL, verbose = FALSE, R = NULL )genCoefs( G = NULL, T, d, density, eff_size, fusion_structure = c("cohort", "event_study"), assignment_type = c("marginal", "multinomial", "ordered"), assignment_strength = 1, assignment_interactions = NULL, assignment_interaction_strength = NULL, seed = NULL, verbose = FALSE, R = NULL )
G |
Optional integer. The number of treated cohorts (treatment is assumed to start in periods 2 to
|
T |
Integer. The total number of time periods. |
d |
Integer. The number of time-invariant covariates. If |
density |
Numeric in (0,1]. The probability that any given entry in the initial
coefficient vector |
eff_size |
Numeric. The magnitude used to scale nonzero entries in |
fusion_structure |
Character. One of |
assignment_type |
Character. One of
|
assignment_strength |
Non-negative numeric scalar. Scales the logit
coefficients in the propensity-score model. |
assignment_interactions |
Optional. A list of length-2 integer
vectors, each naming a pair of covariate indices |
assignment_interaction_strength |
Optional non-negative numeric
scalar. Scales the Gaussian draws of the interaction coefficients
independently of |
seed |
(Optional) Integer. Seed for reproducibility. Three
deterministic offsets share this seed: the main coefficient draw uses
|
verbose |
Logical. If |
R |
Deprecated. The former name for |
Optional arguments assignment_type and assignment_strength
control whether cohort membership in the simulated panel is drawn
marginally (independent of the covariates, the original behavior) or from
a covariate-dependent propensity-score model — either a multinomial-logit
or an ordered-logit (proportional-odds) model. The default
assignment_type = "marginal" preserves the pre-1.14.0 behavior
byte-identically. See vignette("simulation_vignette", package = "fetwfe")
for worked examples.
The length of beta is given by
, where the number of treatment parameters is defined as
.
The function operates in two steps:
It first creates a sparse vector theta of length , with nonzero entries
occurring with probability density. Nonzero entries are set to eff_size or
-eff_size (with a 60\
The full coefficient vector beta is then computed by applying an inverse fusion
transform to theta using internal routines:
genBackwardsInvFusionTransformMat() for the fixed-effect blocks and,
for the treatment-effect block, genInvTwoWayFusionTransformMat() when
fusion_structure = "cohort" or genInvEventStudyFusionTransformMat()
when fusion_structure = "event_study".
The multinomial-logit and proportional-odds reference DGPs are the
canonical parametric propensity-score models named in Faletto (2025)
line 1016; the propensity-weighted population-truth aggregation matches
Eq. att.estimator.weighted (line 837).
Note on the random-number generator: passing an explicit numeric
seed calls set.seed(seed) internally and leaves the
global RNG advanced after the call returns. This is deliberate — it
makes a simulation both reproducible (the same seed always yields
the same coefficients) and varying (drawing data afterwards consumes the
advanced stream). To draw from / preserve the ambient random stream
instead — without calling set.seed() — pass seed = NA
(or seed = NULL).
An object of class "FETWFE_coefs", which is a list containing:
A numeric vector representing the full coefficient vector after the inverse fusion transform.
A numeric vector representing the coefficient vector in the transformed feature
space. theta is a sparse vector, which aligns with an assumption that deviations from the
restrictions encoded in the FETWFE model are sparse. beta is derived from
theta.
The fusion structure ("cohort" or
"event_study") used to build the treatment-effect coefficients.
The provided number of treated cohorts.
Deprecated alias for G, retained for backward
compatibility; populated with the same value. Use G. Will be
removed in a future release.
The provided number of time periods.
The provided number of covariates.
The provided seed.
The selected cohort-assignment DGP
("marginal" / "multinomial" / "ordered").
New in 1.14.0.
The scaling factor applied to the assignment coefficients. New in 1.14.0.
The scaling factor applied to
the interaction coefficients. NULL when no interactions
were specified or when the user passed NULL (the
fall-through default). New in 1.14.1.
NULL when
assignment_type = "marginal"; otherwise a list with elements
type, strength, coefs (the gamma matrix or
vector), and (for ordered) cutpoints. Starting in 1.14.1,
assignment_coefs also carries the sub-slots
interactions (the canonicalized + deduplicated list of
pairs, or NULL), delta (the interaction coefficient
matrix for multinomial or vector for ordered, or NULL),
and interaction_strength (the effective scaling factor
used for the delta draws, or NULL when no
interactions). New in 1.14.0; interactions, delta,
and interaction_strength sub-slots new in 1.14.1.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
fetwfe, whose fusion_structure argument this
mirrors on the estimation side;
vignette("fusion_structure_vignette", package = "fetwfe") for the
cohort-vs-event-study distinction, and
vignette("simulation_vignette", package = "fetwfe") for the full
simulation pipeline.
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) # Event-study-sparse truth: treatment effects that share the same time # since treatment are fused across cohorts (the simulation-side companion # to fetwfe()'s fusion_structure = "event_study"). coefs_es <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, fusion_structure = "event_study", seed = 123 ) # Covariate-dependent cohort assignment: multinomial-logit DGP coefs_mn <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, assignment_type = "multinomial", assignment_strength = 1.0, seed = 123 ) sim_mn <- simulateData(coefs_mn, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) # Covariate-dependent cohort assignment with nonlinear propensity # (multinomial-logit + a single x1*x2 interaction term in the propensity # model only; outcome model continues to use plain X): coefs_int <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, assignment_type = "multinomial", assignment_interactions = list(c(1, 2)), assignment_interaction_strength = 1.5, seed = 123 ) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) # Event-study-sparse truth: treatment effects that share the same time # since treatment are fused across cohorts (the simulation-side companion # to fetwfe()'s fusion_structure = "event_study"). coefs_es <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, fusion_structure = "event_study", seed = 123 ) # Covariate-dependent cohort assignment: multinomial-logit DGP coefs_mn <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, assignment_type = "multinomial", assignment_strength = 1.0, seed = 123 ) sim_mn <- simulateData(coefs_mn, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) # Covariate-dependent cohort assignment with nonlinear propensity # (multinomial-logit + a single x1*x2 interaction term in the propensity # model only; outcome model continues to use plain X): coefs_int <- genCoefs( G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, assignment_type = "multinomial", assignment_interactions = list(c(1, 2)), assignment_interaction_strength = 1.5, seed = 123 ) ## End(Not run)
This function generates a coefficient vector beta along with a sparse auxiliary vector
theta for simulation studies of the fused extended two-way fixed effects estimator. The
returned beta is formatted to align with the design matrix created by
genRandomData(), and is a valid input for the beta argument of that function. The
vector theta is sparse, with nonzero entries occurring with probability density and
scaled by eff_size. See the simulation studies section of Faletto (2025) for details.
genCoefsCore( G = NULL, T, d, density, eff_size, fusion_structure = c("cohort", "event_study"), seed = NULL, R = NULL )genCoefsCore( G = NULL, T, d, density, eff_size, fusion_structure = c("cohort", "event_study"), seed = NULL, R = NULL )
G |
Integer. The number of treated cohorts (treatment is assumed to start in periods 2 to
|
T |
Integer. The total number of time periods. |
d |
Integer. The number of time-invariant covariates. If |
density |
Numeric in (0,1]. The probability that any given entry in the initial
coefficient vector |
eff_size |
Numeric. The magnitude used to scale nonzero entries in |
fusion_structure |
Character. One of |
seed |
(Optional) Integer. Seed for reproducibility. |
R |
Deprecated. The former name for |
The length of beta is given by
, where the number of treatment parameters is defined as
.
The function operates in two steps:
It first creates a sparse vector theta of length , with nonzero entries
occurring
with probability density. Nonzero entries are set to eff_size or -eff_size
(with a 60\
The full coefficient vector beta is then computed by applying an inverse fusion
transform to theta using internal routines:
genBackwardsInvFusionTransformMat() for the fixed-effect blocks and,
for the treatment-effect block, genInvTwoWayFusionTransformMat() when
fusion_structure = "cohort" or genInvEventStudyFusionTransformMat()
when fusion_structure = "event_study".
A list with two elements:
betaA numeric vector representing the full coefficient vector after the inverse fusion transform.
A numeric vector representing the coefficient vector in the transformed feature
space. theta is a sparse vector, which aligns with an assumption that deviations from the
restrictions encoded in the FETWFE model are sparse. beta is derived from
theta.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
## Not run: # Set parameters for the coefficient generation G <- 3 # Number of treated cohorts T <- 6 # Total number of time periods d <- 2 # Number of covariates density <- 0.1 # Probability that an entry in the initial vector is nonzero eff_size <- 1.5 # Scaling factor for nonzero coefficients seed <- 789 # Seed for reproducibility # Generate coefficients using genCoefsCore() coefs_core <- genCoefsCore(G = G, T = T, d = d, density = density, eff_size = eff_size, seed = seed) beta <- coefs_core$beta theta <- coefs_core$theta # For diagnostic purposes, compute the expected length of beta. # The length p is defined internally as: # p = G + (T - 1) + d + d*G + d*(T - 1) + num_treats + num_treats*d, # where num_treats = T * G - (G*(G+1))/2. num_treats <- T * G - (G * (G + 1)) / 2 p_expected <- G + (T - 1) + d + d * G + d * (T - 1) + num_treats + num_treats * d cat("Length of beta:", length(beta), "\nExpected length:", p_expected, "\n") ## End(Not run)## Not run: # Set parameters for the coefficient generation G <- 3 # Number of treated cohorts T <- 6 # Total number of time periods d <- 2 # Number of covariates density <- 0.1 # Probability that an entry in the initial vector is nonzero eff_size <- 1.5 # Scaling factor for nonzero coefficients seed <- 789 # Seed for reproducibility # Generate coefficients using genCoefsCore() coefs_core <- genCoefsCore(G = G, T = T, d = d, density = density, eff_size = eff_size, seed = seed) beta <- coefs_core$beta theta <- coefs_core$theta # For diagnostic purposes, compute the expected length of beta. # The length p is defined internally as: # p = G + (T - 1) + d + d*G + d*(T - 1) + num_treats + num_treats*d, # where num_treats = T * G - (G*(G+1))/2. num_treats <- T * G - (G * (G + 1)) / 2 p_expected <- G + (T - 1) + d + d * G + d * (T - 1) + num_treats + num_treats * d cat("Length of beta:", length(beta), "\nExpected length:", p_expected, "\n") ## End(Not run)
This function extracts the true treatment effects from a full coefficient vector
as generated by genCoefs(). It returns the per-cohort CATTs and an
overall ATT. Under the default marginal cohort-assignment DGP, the overall
ATT is the equal-weighted mean of the cohort-specific effects. Under the
covariate-dependent DGPs introduced in 1.14.0, the overall ATT is a
propensity-weighted mean using cohort weights
,
matching Faletto (2025) Eq. att.estimator.weighted (line 837) at the
population level. The expected propensities are computed by Monte Carlo
integration over the covariate distribution.
getTes(coefs_obj)getTes(coefs_obj)
coefs_obj |
An object of class |
The function internally uses auxiliary routines getNumTreats(), getP(),
getFirstInds(), getTreatInds(), and getActualCohortTes() to determine the
correct indices of treatment effect coefficients in beta. The overall treatment effect
is computed as a weighted average of the cohort-specific effects (uniform
weights under the marginal DGP, propensity weights otherwise).
Under non-marginal DGPs, is estimated by
Monte Carlo integration over the X distribution (Gaussian by default) with
M = 10000 draws. The Monte Carlo seed is offset from the main
coefs_obj$seed by + 2L per the documented seed-offset
convention.
An object of class "FETWFE_tes", which is a list with the
following elements:
A numeric value representing the overall average treatment
effect on the treated. Under the marginal DGP this is the
equal-weighted mean of the cohort-specific effects; under
covariate-dependent DGPs it is the propensity-weighted mean using
cohort_weights.
A numeric vector of length G containing the
true cohort-specific treatment effects, calculated by averaging the
coefficients corresponding to the treatment dummies for each cohort.
Intrinsic to ; does not depend on the assignment
DGP.
An integer vector of length G giving the calendar
time period at which each treated cohort first adopts treatment. In
the simulator's convention cohort g adopts at calendar time
g + 1 (cohort 0 is never-treated).
Numeric vector of length G summing to 1. Under
the marginal DGP this is uniform 1/G. Under
assignment_type = "multinomial" or "ordered" it is
.
New in 1.14.0.
Character; the cohort-assignment DGP carried over
from coefs_obj (one of "marginal",
"multinomial", or "ordered"). Determines whether
cohort_weights is uniform (marginal) or propensity-weighted.
New in 1.18.1.
Numeric; the assignment-strength scaling carried
over from coefs_obj (meaningful only when
assignment_type != "marginal"). NULL for
FETWFE_coefs objects saved before 1.14.0. New in 1.18.1.
The generating parameters carried over from
coefs_obj so that print() and summary() on the
returned object are self-describing.
Deprecated alias for G, retained for backward
compatibility; populated with the same value. Use G. Will be
removed in a future release.
Use print() or summary() on the returned object for a
formatted display.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Compute the true treatment effects: te_results <- getTes(coefs) # Overall average treatment effect on the treated: print(te_results$att_true) # Cohort-specific treatment effects: print(te_results$actual_cohort_tes) # Or use the new print method for a self-describing display: print(te_results) # Propensity-weighted truth under covariate-dependent DGP: coefs_mn <- genCoefs(G = 3, T = 5, d = 2, density = 0.5, eff_size = 2, assignment_type = "multinomial", assignment_strength = 1.0, seed = 42) te_mn <- getTes(coefs_mn) te_mn$att_true # propensity-weighted overall ATT te_mn$cohort_weights # length G; sums to 1 ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Compute the true treatment effects: te_results <- getTes(coefs) # Overall average treatment effect on the treated: print(te_results$att_true) # Cohort-specific treatment effects: print(te_results$actual_cohort_tes) # Or use the new print method for a self-describing display: print(te_results) # Propensity-weighted truth under covariate-dependent DGP: coefs_mn <- genCoefs(G = 3, T = 5, d = 2, density = 0.5, eff_size = 2, assignment_type = "multinomial", assignment_strength = 1.0, seed = 42) te_mn <- getTes(coefs_mn) te_mn$att_true # propensity-weighted overall ATT te_mn$cohort_weights # length G; sums to 1 ## End(Not run)
betwfe fitted objectSame schema as glance.fetwfe() (BETWFE also has regularization).
## S3 method for class 'betwfe' glance(x, ...)## S3 method for class 'betwfe' glance(x, ...)
x |
An object of class |
... |
Unused. |
A one-row data frame with 16 columns.
## Not run: res <- betwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)## Not run: res <- betwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)
etwfe fitted objectLike glance.fetwfe() but omits the lambda_star /
lambda_star_model_size columns — ETWFE has no regularization.
## S3 method for class 'etwfe' glance(x, ...)## S3 method for class 'etwfe' glance(x, ...)
x |
An object of class |
... |
Unused. |
A one-row data frame with 11 columns.
## Not run: res <- etwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)## Not run: res <- etwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)
fetwfe fitted objectReturns a one-row broom-style summary data frame with model-level
scalars: panel-shape counts (nobs, n_units, n_periods,
n_cohorts, n_covs, n_features), bridge-regression tuning
(lambda_star, lambda_star_model_size), variance components
(sig_eps_sq, sig_eps_c_sq), and inference settings (alpha,
se_type, indep_counts_used).
## S3 method for class 'fetwfe' glance(x, ...)## S3 method for class 'fetwfe' glance(x, ...)
x |
An object of class |
... |
Unused. |
A one-row data frame with 16 columns.
## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)
twfeCovs fitted objectLike glance.etwfe() (and with the same schema): omits the lambda_star /
lambda_star_model_size columns, since twfeCovs performs no
regularization.
## S3 method for class 'twfeCovs' glance(x, ...)## S3 method for class 'twfeCovs' glance(x, ...)
x |
An object of class |
... |
Unused. |
A one-row data frame with 11 columns.
## Not run: res <- twfeCovsWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)## Not run: res <- twfeCovsWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::glance(res) ## End(Not run)
Parallel to plot.fetwfe(). BETWFE uses the same bridge-penalty
selection mechanism, so selected is encoded the same way (TRUE
= nonzero, FALSE = zeroed by the bridge penalty).
## S3 method for class 'betwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)## S3 method for class 'betwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)
x |
A fitted object from |
type |
Character; either |
conf_int |
Logical; if |
alpha |
Numeric; overrides the fit's alpha for CI computation.
|
... |
Currently unused; reserved for future arguments. |
A ggplot object.
plot.fetwfe() for the full documentation; eventStudy();
cohortStudy().
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- betwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) } ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- betwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) } ## End(Not run)
Parallel to plot.fetwfe(). ETWFE does not perform selection, so all
points are uniformly styled (no selected = TRUE / FALSE encoding).
## S3 method for class 'etwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)## S3 method for class 'etwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)
x |
A fitted object from |
type |
Character; either |
conf_int |
Logical; if |
alpha |
Numeric; overrides the fit's alpha for CI computation.
|
... |
Currently unused; reserved for future arguments. |
A ggplot object.
plot.fetwfe() for the full documentation; eventStudy();
cohortStudy().
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- etwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) } ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- etwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) } ## End(Not run)
Returns a ggplot object showing either event-study coefficients
(default; type = "event_study") or per-cohort average treatment
effects (type = "catt") from a fitted estimator. Mirrors the
visualization style of did::ggdid() from the Callaway-Sant'Anna
did package, providing a single-call route from a fitted object to
a publication-ready visualization.
For fetwfe / betwfe (the bridge-penalty estimators) in the CATT
view, points are shape- and color-coded by whether the bridge
penalty left that cohort's ATT nonzero (selected = TRUE) or
zeroed it out (selected = FALSE). For etwfe (no selection),
all points are uniformly styled.
## S3 method for class 'fetwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)## S3 method for class 'fetwfe' plot(x, type = c("event_study", "catt"), conf_int = TRUE, alpha = NULL, ...)
x |
A fitted object from |
type |
Character; either |
conf_int |
Logical; if |
alpha |
Numeric; overrides the fit's alpha for CI computation.
|
... |
Currently unused; reserved for future arguments. |
A ggplot object. Users can customize further via standard
ggplot layer-addition syntax (e.g.,
plot(res) + ggplot2::theme_classic()).
cohortStudy() for the per-cohort accessor;
eventStudy() for the event-time accessor;
plot.etwfe(), plot.betwfe() for the parallel methods.
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) # default: event-study coefficients plot(res, type = "catt") # per-cohort ATTs plot(res, conf_int = FALSE) # point estimates only plot(res, alpha = 0.1) # 90% CIs } ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) dat <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) res <- fetwfeWithSimulatedData(dat) if (requireNamespace("ggplot2", quietly = TRUE)) { plot(res) # default: event-study coefficients plot(res, type = "catt") # per-cohort ATTs plot(res, conf_int = FALSE) # point estimates only plot(res, alpha = 0.1) # 90% CIs } ## End(Not run)
plot() is intentionally not provided for twfeCovs() objects (#58):
twfeCovs() estimates one pooled effect per cohort, so there is no
per-(cohort, time) / event-study structure to plot. Use summary() or
tidy.twfeCovs() for the cohort effects. Calling this method always raises
an error.
## S3 method for class 'twfeCovs' plot(x, ...)## S3 method for class 'twfeCovs' plot(x, ...)
x |
An object of class |
... |
Ignored. |
(none; raises an error).
Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator by taking an object of class "FETWFE_coefs" (produced by
genCoefs()) and using it to simulate data. The function creates a balanced panel
with units over time periods, assigns treatment status across
treated cohorts (with equal marginal probabilities for treatment and non-treatment), and
constructs a design matrix along with the corresponding outcome. The covariates are
generated according to the specified distribution: by default, covariates are drawn
from a normal distribution; if distribution = "uniform", they are drawn uniformly
from . When (i.e. no covariates), no
covariate-related columns or interactions are generated. See the simulation studies section of
Faletto (2025) for details.
simulateData( coefs_obj, N, sig_eps_sq, sig_eps_c_sq, distribution = "gaussian", guarantee_rank_condition = FALSE, seed = NULL )simulateData( coefs_obj, N, sig_eps_sq, sig_eps_c_sq, distribution = "gaussian", guarantee_rank_condition = FALSE, seed = NULL )
coefs_obj |
An object of class |
N |
Integer. Number of units in the panel. |
sig_eps_sq |
Numeric. Variance of the idiosyncratic (observation-level) noise. |
sig_eps_c_sq |
Numeric. Variance of the unit-level random effects.
Must be non-negative; |
distribution |
Character. Distribution to generate covariates.
Defaults to |
guarantee_rank_condition |
(Optional). Logical. If TRUE, the returned
data set is guaranteed to have at least |
seed |
(Optional) Controls the random-number generator for the simulated
panel. As of fetwfe 1.24.0 the default is |
This function extracts simulation parameters from the FETWFE_coefs object and passes them,
along with additional simulation parameters, to the internal function simulateDataCore().
It validates that all necessary components are returned and assigns the S3 class
"FETWFE_simulated" to the output.
The random draw is controlled by the seed argument, not by
coefs_obj$seed. By default (seed = NULL) simulateData()
draws from the ambient random-number generator (so a preceding
set.seed() is respected and repeated calls return different panels) and
emits a warning noting that this default changed in fetwfe 1.24.0. Pass an
integer seed for a reproducible panel (the same integer always yields
the same panel), or seed = NA to use the ambient generator without the
warning. To vary the panel across Monte Carlo replications, pass a different
seed each replication.
Passing an explicit numeric seed calls set.seed(seed)
internally and leaves the global RNG advanced after the call
returns. This is deliberate — it makes a simulation both reproducible
(the same seed always yields the same panel) and varying (subsequent
draws consume the advanced stream). Use seed = NA (or the default
seed = NULL) to draw from / preserve the ambient stream without
calling set.seed().
The argument distribution controls the generation of covariates. For
"gaussian", covariates are drawn from rnorm; for "uniform",
they are drawn from runif on the interval (which ensures that
the covariates have unit variance regardless of which distribution is chosen).
When (i.e. no covariates), the function omits any covariate-related columns
and their interactions.
An object of class "FETWFE_simulated", which is a list containing:
A dataframe containing generated data that can be passed to fetwfe().
The design matrix , with columns with interactions.
A numeric vector of length containing the generated responses.
A character vector containing the names of the generated features (if ),
or simply an empty vector (if )
The name of the time variable in pdata
The name of the unit variable in pdata
The name of the treatment variable in pdata
The name of the response variable in pdata
The coefficient vector used for data generation.
A vector of indices indicating the first treatment effect for each treated cohort.
The number of never-treated units.
A vector of counts (of length ) indicating how many units fall into
the never-treated group and each of the treated cohorts.
Independent cohort assignments (for auxiliary purposes).
The number of columns in the design matrix .
Number of units.
Number of time periods.
Number of treated cohorts.
Deprecated alias for G, retained for backward
compatibility; populated with the same value. Use G. Will be
removed in a future release.
Number of covariates.
The idiosyncratic noise variance.
The unit-level noise variance.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5) ## End(Not run)
Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator. The function creates a balanced panel with units over
time periods, assigns treatment status across treated cohorts (with equal marginal
probabilities for treatment and non-treatment), and constructs a design matrix along with the
corresponding outcome. When gen_ints = TRUE the full design matrix is returned (including
interactions between covariates and fixed effects and treatment indicators). When
gen_ints = FALSE the design matrix is generated in a simpler format (with no interactions)
as expected by fetwfe(). Moreover, the covariates are generated according to the
specified distribution: by default, covariates are drawn from a normal distribution;
if distribution = "uniform", they are drawn uniformly from .
When (i.e. no covariates), no covariate-related columns or interactions are
generated.
See the simulation studies section of Faletto (2025) for details.
simulateDataCore( N, T, G = NULL, d, sig_eps_sq, sig_eps_c_sq, beta, seed = NULL, gen_ints = FALSE, distribution = "gaussian", guarantee_rank_condition = FALSE, assignment_type = "marginal", assignment_coefs = NULL, R = NULL )simulateDataCore( N, T, G = NULL, d, sig_eps_sq, sig_eps_c_sq, beta, seed = NULL, gen_ints = FALSE, distribution = "gaussian", guarantee_rank_condition = FALSE, assignment_type = "marginal", assignment_coefs = NULL, R = NULL )
N |
Integer. Number of units in the panel. |
T |
Integer. Number of time periods. |
G |
Integer. Number of treated cohorts (with treatment starting in periods 2 to T). |
d |
Integer. Number of time-invariant covariates. |
sig_eps_sq |
Numeric. Variance of the idiosyncratic (observation-level) noise. |
sig_eps_c_sq |
Numeric. Variance of the unit-level random effects.
Must be non-negative; |
beta |
Numeric vector. Coefficient vector for data generation. Its required length depends
on the value of
|
seed |
(Optional) Integer. Seed for reproducibility. |
gen_ints |
Logical. If |
distribution |
Character. Distribution to generate covariates.
Defaults to |
guarantee_rank_condition |
(Optional). Logical. If TRUE, the returned
data set is guaranteed to have at least |
assignment_type |
Character. One of |
assignment_coefs |
Optional list returned by
|
R |
Deprecated. The former name for |
When gen_ints = TRUE, the function constructs the design matrix by first generating
base fixed effects and a long-format covariate matrix (via generateBaseEffects()), then
appending interactions between the covariates and cohort/time fixed effects (via
generateFEInts()) and finally treatment indicator columns and treatment-covariate
interactions (via genTreatVarsSim() and genTreatInts()). When
gen_ints = FALSE, the design matrix consists only of the base fixed effects, covariates,
and treatment indicators.
The argument distribution controls the generation of covariates. For
"gaussian", covariates are drawn from rnorm; for "uniform",
they are drawn from runif on the interval .
When (i.e. no covariates), the function omits any covariate-related columns
and their interactions.
An object of class "FETWFE_simulated", which is a list containing:
A dataframe containing generated data that can be passed to fetwfe().
The design matrix. When gen_ints = TRUE, has columns with
interactions; when gen_ints = FALSE, has no interactions.
A numeric vector of length containing the generated responses.
A character vector containing the names of the generated features (if ),
or simply an empty vector (if )
The name of the time variable in pdata
The name of the unit variable in pdata
The name of the treatment variable in pdata
The name of the response variable in pdata
The coefficient vector used for data generation.
A vector of indices indicating the first treatment effect for each treated cohort.
The number of never-treated units.
A vector of counts (of length ) indicating how many units fall into
the never-treated group and each of the treated cohorts.
Independent cohort assignments (for auxiliary purposes).
The number of columns in the design matrix .
Number of units.
Number of time periods.
Number of treated cohorts.
Deprecated alias for G, retained for backward
compatibility; populated with the same value. Use G. Will be
removed in a future release.
Number of covariates.
The idiosyncratic noise variance.
The unit-level noise variance.
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
## Not run: # Set simulation parameters N <- 100 # Number of units in the panel T <- 5 # Number of time periods G <- 3 # Number of treated cohorts d <- 2 # Number of time-invariant covariates sig_eps_sq <- 1 # Variance of observation-level noise sig_eps_c_sq <- 0.5 # Variance of unit-level random effects # Generate coefficient vector using genCoefsCore() # (Here, density controls sparsity and eff_size scales nonzero entries) coefs_core <- genCoefsCore(G = G, T = T, d = d, density = 0.2, eff_size = 2, seed = 123) # Now simulate the data. Setting gen_ints = TRUE generates the full design matrix with interactions. sim_data <- simulateDataCore( N = N, T = T, G = G, d = d, sig_eps_sq = sig_eps_sq, sig_eps_c_sq = sig_eps_c_sq, beta = coefs_core$beta, seed = 456, gen_ints = TRUE, distribution = "gaussian" ) # Examine the returned list: str(sim_data) ## End(Not run)## Not run: # Set simulation parameters N <- 100 # Number of units in the panel T <- 5 # Number of time periods G <- 3 # Number of treated cohorts d <- 2 # Number of time-invariant covariates sig_eps_sq <- 1 # Variance of observation-level noise sig_eps_c_sq <- 0.5 # Variance of unit-level random effects # Generate coefficient vector using genCoefsCore() # (Here, density controls sparsity and eff_size scales nonzero entries) coefs_core <- genCoefsCore(G = G, T = T, d = d, density = 0.2, eff_size = 2, seed = 123) # Now simulate the data. Setting gen_ints = TRUE generates the full design matrix with interactions. sim_data <- simulateDataCore( N = N, T = T, G = G, d = d, sig_eps_sq = sig_eps_sq, sig_eps_c_sq = sig_eps_c_sq, beta = coefs_core$beta, seed = 456, gen_ints = TRUE, distribution = "gaussian" ) # Examine the returned list: str(sim_data) ## End(Not run)
Computes simultaneous (family-wise) confidence intervals for a user-
specified family of treatment effects from a fitted FETWFE / ETWFE /
BETWFE / twfeCovs object. The simultaneous critical value c_{1 - alpha}
is the (1 - alpha) quantile of max_k |Z_k| where Z follows a
multivariate normal with correlation matrix cov2cor(Sigma); it is
computed deterministically via mvtnorm::qmvnorm(). Under Faletto (2025)
Theorem (c') tight Gaussianity and Assumption (Psi-IF), the family of
psi-linear effects is asymptotically multivariate normal with a covariance
that is estimable from the package's existing variance machinery; under the
paper's fixed-dim framing no high-dimensional correction is needed.
The pointwise critical value qnorm(1 - alpha/2) (per-effect coverage) and
the Bonferroni-conservative critical value qnorm(1 - alpha/(2K)) (family-
wise coverage with no correlation assumption) are returned for side-by-side
comparison; the simultaneous critical value is always between them when the
effects are positively correlated (as is typical in difference-in-
differences, where effects share the regression-coefficient variance piece).
simultaneousCIs( result, family = c("event_study", "cohort", "all_post_treatment", "custom"), alpha = 0.05, contrasts = NULL )simultaneousCIs( result, family = c("event_study", "cohort", "all_post_treatment", "custom"), alpha = 0.05, contrasts = NULL )
result |
A fitted object of class |
family |
Character; one of |
alpha |
Numeric in |
contrasts |
For |
Family resolution and K. "event_study" resolves to one effect per
post-treatment event time e = 0, ..., T - 2 (K = T - 1); "cohort" to
one effect per treated cohort (K = G); "all_post_treatment" to one
effect per (g, t) cell (K = num_treats); "custom" to the
K = nrow(contrasts) user-supplied contrasts.
Joint covariance. The K x K covariance Sigma = Sigma_1 + Sigma_2 is
reconstructed at call time from the fit's stored slots (design matrix,
selected support, theta_hat / beta_hat, cohort_probs_overall,
sig_eps_sq). Sigma_1 is the regression-coefficient piece and Sigma_2
the cohort-probability piece, generalizing the package's per-point variance
machinery (the same machinery eventStudy() uses). By construction
sqrt(diag(Sigma)) equals the package's existing per-point standard errors
for the corresponding effects. The Sigma blocks are not persisted on the
fit; re-derivation is sub-second.
Degenerate (zero-variance) effects. An effect whose entire contribution
to the selected support is zeroed by the bridge penalty – or, in
scattered-cohort panels, an event time with an empty valid-cohort set – has
a standard error of exactly 0 by construction, so its simultaneous and
pointwise CIs collapse to a point at the estimate and it is excluded from the
joint correlation matrix (it adds no family-wise risk; the critical value is
computed over the non-degenerate sub-family). This se = 0 convention is the
simultaneous-CI analog of the NA standard error eventStudy() reports for
the same structurally-degenerate event times; both assign the effect an
estimate of 0.
Paper grounding. Theorem (c') tight Gaussianity (Faletto 2025,
paper_arxiv.tex:1233) guarantees the joint asymptotic normality; Assumption
(Psi-IF) (assumption equation paper_arxiv.tex:2013; in-prose discussion at
paper line 1268) is the influence-function condition the package's default
cohort-sample-proportions estimator satisfies; the fixed-dim framing follows
the paper's AE point 1(d).
Conservative fallback. When the fit was made with se_type = "conservative", the function
falls back to Bonferroni-corrected pointwise CIs (the Cauchy-Schwarz upper
bound used for the conservative scalar SE does not generalize to a K x K
covariance matrix) and emits a brief message(). The $critical_value
field is set to the Bonferroni value in this branch.
Numerical integration. The critical value is computed via
mvtnorm::qmvnorm(..., algorithm = mvtnorm::GenzBretz()) (mvtnorm's default
quasi-Monte Carlo integrator; sub-second through K up to about 100, so no
K cap is of practical concern for FETWFE families). mvtnorm is an
Imports dependency (as of version 1.16.0, when simultaneous bands became
the default reported confidence interval; see the ci_type argument of
fetwfe()). The function uses it only when K > 1 and
se_type != "conservative" (the K = 1 and conservative paths bypass the
dependency), and retains a defensive stop() with an actionable message if
it is somehow unavailable (e.g., a corrupted install).
Determinism contract. The function is deterministic in its inputs: the
same fit plus the same family, alpha, and contrasts always produces
the same critical value across calls. This is achieved by wrapping the
internal mvtnorm::qmvnorm() call with a save/restore of the caller's
.Random.seed and a fixed internal set.seed(1L) immediately before the
call. The function does NOT mutate the caller's .Random.seed (the
save/restore via on.exit() leaves the caller's RNG state identical pre- and
post-call), matching the convention adopted by
R/fetwfe_core.R::getBetaCV() in PR #181 / v1.13.5. Users do not need to
call set.seed() before simultaneousCIs() to get reproducible results,
and downstream RNG-using code observes no perturbation.
An object of S3 class "simultaneous_cis": a list with
A data frame with columns effect, estimate,
simultaneous_ci_low, simultaneous_ci_high, pointwise_ci_low,
pointwise_ci_high (one row per effect in the family).
Numeric vector of length K: the single-step
max-T multiplicity-adjusted (family-wise) p-value for each effect, the
exact dual of the simultaneous band (a coefficient lies outside the
(1 - alpha) band iff its adjusted p-value is < alpha). Computed via
mvtnorm::pmvnorm() over the same correlation matrix the band uses
(or, under se_type = "conservative", the Bonferroni adjustment
min(1, K * pointwise_p)). NA for degenerate (zero-variance)
effects. (#200)
The simultaneous critical value c_{1 - alpha}
(or, when the fit used se_type = "conservative", the Bonferroni critical value
qnorm(1 - alpha/(2K)) – see Details).
qnorm(1 - alpha/2), for reference.
qnorm(1 - alpha/(2K)), for reference.
The requested family (character).
The significance level used.
The number of effects in the family (integer).
Faletto, G. (2025). Fused Extended Two-Way Fixed Effects for Difference-in- Differences with Staggered Adoptions. arXiv:2312.05985.
Hothorn, T., Bretz, F., & Westfall, P. (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal 50(3), 346-363.
eventStudy() for the per-point event-study estimates and Wald
intervals that family = "event_study" provides simultaneous bands over.
coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) sim <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) fit <- fetwfeWithSimulatedData(sim) sci <- simultaneousCIs(fit, family = "event_study", alpha = 0.05) print(sci)coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) sim <- simulateData(coefs, N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) fit <- fetwfeWithSimulatedData(sim) sci <- simultaneousCIs(fit, family = "event_study", alpha = 0.05) print(sci)
betwfe fitted objectLike tidy.fetwfe() but for a BETWFE fit. Includes the selected
column reflecting BETWFE's bridge-penalized selection.
## S3 method for class 'betwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)## S3 method for class 'betwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)
x |
An object of class |
conf.int |
Logical; include CI columns. |
conf.level |
Numeric in (0, 1); defaults to |
... |
Unused. |
A data frame with G + 1 rows.
## Not run: res <- betwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)## Not run: res <- betwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)
cohortStudy objectReturns a broom-style tidy data frame for the output of
cohortStudy(). Renames the snake_case columns of catt_df to broom
conventions (se -> std.error, p_value -> p.value,
ci_low / ci_high -> conf.low / conf.high) and adds a term
column ("cohort_<cohort label>") plus a statistic column
(estimate / std.error) so the schema matches tidy.eventStudy()
for downstream bind_rows() consumers. When the input carries a
selected column (fetwfe / betwfe), it is passed through as the
final column.
## S3 method for class 'cohortStudy' tidy(x, ...)## S3 method for class 'cohortStudy' tidy(x, ...)
x |
A |
... |
Unused; present for S3 compatibility. |
Confidence intervals come from the cohort fit's stored bounds (which
encode the alpha passed at fit time); unlike tidy.eventStudy(), this
method does not recompute the CIs at a custom conf.level because the
standard errors in catt_df are already paired with the fit-time
bounds (ci_low / ci_high), so re-emitting those is the
minimum-surprise behavior.
A data frame with one row per treated cohort and columns
term, estimate, std.error, statistic, p.value, conf.low,
conf.high, and (if present in the input) selected.
## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(cohortStudy(res)) ## End(Not run)## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(cohortStudy(res)) ## End(Not run)
cohortTimeATTs objectReturns a broom-style tidy data frame for the output of
cohortTimeATTs(). Renames the snake_case columns to broom conventions
(se -> std.error, p_value -> p.value, ci_low / ci_high ->
conf.low / conf.high), keeps the time column, and adds a term
column ("cohort_<cohort label>_time_<time>") plus a statistic column
(estimate / std.error) so the schema parallels tidy.cohortStudy() and
tidy.eventStudy() for downstream bind_rows() consumers. When the input
carries a selected column (fetwfe / betwfe), it is passed through as
the final column.
## S3 method for class 'cohortTimeATTs' tidy(x, ...)## S3 method for class 'cohortTimeATTs' tidy(x, ...)
x |
A |
... |
Unused; present for S3 compatibility. |
Confidence intervals are the pointwise 1 - alpha Wald bounds
cohortTimeATTs() computed (encoding the alpha passed there); like
tidy.cohortStudy() this method passes them through rather than
recomputing at a custom conf.level.
A data frame with one row per (cohort, time) cell and columns
term, time, estimate, std.error, statistic, p.value,
conf.low, conf.high, and (if present in the input) selected.
## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(cohortTimeATTs(res)) ## End(Not run)## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(cohortTimeATTs(res)) ## End(Not run)
etwfe fitted objectLike tidy.fetwfe() but for an ETWFE fit. Has no selected column
(ETWFE does no regularized selection).
## S3 method for class 'etwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)## S3 method for class 'etwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)
x |
An object of class |
conf.int |
Logical; include CI columns. |
conf.level |
Numeric in (0, 1); defaults to |
... |
Unused. |
A data frame with G + 1 rows.
## Not run: res <- etwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)## Not run: res <- etwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)
eventStudy objectReturns a broom-style tidy data frame for the output of
eventStudy(). Renames existing columns to broom conventions
(se std.error, p_value p.value) and adds a term
column ("e<event_time>") plus a statistic column
(estimate / std.error) so the schema matches tidy.<estimator>()
for downstream bind_rows() consumers.
## S3 method for class 'eventStudy' tidy(x, conf.int = TRUE, conf.level = 0.95, ...)## S3 method for class 'eventStudy' tidy(x, conf.int = TRUE, conf.level = 0.95, ...)
x |
An object of class |
conf.int |
Logical; include |
conf.level |
Numeric in (0, 1). Retained for |
... |
Unused. |
The eventStudy() output stores its confidence-interval bounds (ci_low /
ci_high), which reflect the fit's ci_type (#197): simultaneous
(family-wise, uniform) by default, or pointwise when the fit used
ci_type = "pointwise". When conf.int = TRUE (the default), conf.low /
conf.high PASS THROUGH those stored bounds rather than recomputing from
estimate +/- z * se — so the tidied event-study CIs agree with
print / summary / plot and with simultaneousCIs() under the default.
When conf.int = FALSE, the CI columns are omitted. (Degenerate event times
carry NA bounds under both ci_type settings.)
A data frame with one row per event-time and columns term,
event_time, n_cohorts, estimate, std.error, statistic,
p.value, and (when conf.int = TRUE) conf.low / conf.high.
## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(eventStudy(res)) ## End(Not run)## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(eventStudy(res)) ## End(Not run)
fetwfe fitted objectReturns a broom-style tidy data frame for an object of class "fetwfe".
Row 1 is the overall ATT (term = "ATT"); subsequent rows are the
cohort-specific ATTs (term = "Cohort <adoption-time>"), one per
treated cohort, sorted by ascending cohort label. Standard error,
z-statistic, and p-value reflect the value of se_type used at fit time
(model-based by default, cluster-robust under se_type = "cluster").
Cohorts that the bridge penalty zeroed out (selected = FALSE) carry
NA for std.error / statistic / p.value.
## S3 method for class 'fetwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)## S3 method for class 'fetwfe' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)
x |
An object of class |
conf.int |
Logical. If |
conf.level |
Numeric in (0, 1). Applies only to the overall-ATT row
(row 1), whose CI is recomputed at this level; defaults to |
... |
Unused; present for S3 compatibility. |
The cohort-row conf.low / conf.high columns pass through the fit's
stored catt_df bounds, so they reflect the fit's ci_type (#197):
simultaneous (family-wise) by default, or pointwise when the fit used
ci_type = "pointwise". They are NOT recomputed from conf.level (see the
conf.level note). The overall-ATT row (row 1) is a scalar, so its CI is
the pointwise Wald interval at conf.level (pointwise == simultaneous for a
single effect).
A data frame with G + 1 rows and columns term, estimate,
std.error, statistic, p.value, optionally conf.low /
conf.high, and selected (logical).
## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)## Not run: res <- fetwfeWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)
FETWFE_tes simulation truth objectReturns a broom-style tidy data frame for the population-truth
object returned by getTes(). Row 1 is the overall true ATT
(term = "ATT_true"); subsequent rows are the true cohort ATTs
(term = "Cohort <adoption-time>", using the simulator's
convention that cohort g adopts at calendar time g + 1, so
the labels match what tidy.<estimator> uses on a fitted panel
generated from the same FETWFE_coefs). Standard error /
statistic / p-value columns are always NA_real_ — there is no
sampling distribution for a population truth. When
conf.int = TRUE (default, matching the sibling tidy methods),
conf.low / conf.high columns are included and also set to
NA_real_. When conf.int = FALSE, those columns are omitted.
## S3 method for class 'FETWFE_tes' tidy(x, conf.int = TRUE, conf.level = 0.95, ...)## S3 method for class 'FETWFE_tes' tidy(x, conf.int = TRUE, conf.level = 0.95, ...)
x |
An object of class |
conf.int |
Logical; include |
conf.level |
Numeric in (0, 1). Accepted for broom-convention
parity but unused (no CIs to compute for a population truth);
validated regardless. Defaults to |
... |
Unused. |
A data frame with G + 1 rows and columns term,
estimate, std.error, statistic, p.value, and (when
conf.int = TRUE) conf.low / conf.high.
## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) broom::tidy(getTes(coefs)) ## End(Not run)## Not run: coefs <- genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2) broom::tidy(getTes(coefs)) ## End(Not run)
twfeCovs fitted objectLike tidy.etwfe() but for a TWFE-with-covariates fit. Has no selected
column (twfeCovs is pure OLS and does no regularized selection).
twfeCovs estimates one pooled effect per cohort, so the returned frame
has the same G + 1 rows (overall ATT in row 1, then one row per cohort)
as the sibling estimators.
## S3 method for class 'twfeCovs' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)## S3 method for class 'twfeCovs' tidy(x, conf.int = TRUE, conf.level = 1 - x$alpha, ...)
x |
An object of class |
conf.int |
Logical; include CI columns. |
conf.level |
Numeric in (0, 1); defaults to |
... |
Unused. |
A data frame with G + 1 rows.
## Not run: res <- twfeCovsWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)## Not run: res <- twfeCovsWithSimulatedData( simulateData(genCoefs(G = 3, T = 6, d = 2, density = 0.5, eff_size = 2), N = 120, sig_eps_sq = 1, sig_eps_c_sq = 0.5, seed = 123) ) broom::tidy(res) ## End(Not run)
WARNING: This function should NOT be used for estimation. It is a biased estimator of treatment effects. Implementation of two-way fixed effects with covariates and a single pooled treatment effect per cohort. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units). It is implemented only for the sake of the simulation studies in Faletto (2025). This estimator is only unbiased under the assumptions that treatment effects are homogeneous across covariates and are identical within cohorts across all times since treatment.
twfeCovs( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )twfeCovs( pdata, time_var, unit_var, treatment, response, covs = c(), indep_counts = NA, sig_eps_sq = NA, sig_eps_c_sq = NA, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Either a character vector containing the names of
the columns for covariates (e.g., |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID
noise assumed to apply to each observation. See Section 2 of Faletto (2025)
for details. It is best to provide this variance if it is known (for example,
if you are using simulated data). If this variance is unknown, this argument
can be omitted, and the variance will be estimated by
REML on the linear mixed-effects model |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID
noise (random effects) assumed to apply to each observation. See Section 2 of
Faletto (2025) for details. It is best to provide this variance if it is
known (for example, if you are using simulated data). If this variance is
unknown, this argument can be omitted, and the variance will be estimated
by REML via |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
ci_type |
Character; one of |
An object of class twfeCovs containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If the Gram matrix is not invertible, this will be NA. |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact) standard errors for the estimated average treatment effects within each cohort. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
y_mean |
Numeric scalar; mean of the original (pre-centering) response.
Stored so downstream methods ( |
response_col_name |
Character scalar; the response column name in
the original |
time_var, unit_var, treatment
|
Character scalars; the corresponding arguments the user passed. |
covs |
Character vector; the original |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
alpha |
The alpha level used for confidence intervals. |
ci_type |
Character scalar; the |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
Gregory Faletto
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545-554.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
## Not run: library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # No `covs` here: twfeCovs is pure OLS (no bridge penalty), and castle's # smallest adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. res <- twfeCovs( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Print results print(res, max_cohorts = Inf) ## End(Not run)## Not run: library(bacondecomp) data(castle) # Response: the log homicide rate. Treatment: `cdl` records the share of # the year the castle-doctrine law was in effect, so `cdl > 0` gives the # absorbing 0/1 treatment indicator. castle$l_homicide <- log(castle$homicide) castle$treated <- as.integer(castle$cdl > 0) # No `covs` here: twfeCovs is pure OLS (no bridge penalty), and castle's # smallest adoption cohorts contain a single state, so the design is # rank-deficient once any covariate is added. res <- twfeCovs( pdata = castle, time_var = "year", unit_var = "state", treatment = "treated", response = "l_homicide", verbose = TRUE) # Print results print(res, max_cohorts = Inf) ## End(Not run)
S3 class for the output of twfeCovs(). Carries the same
styled print / summary / coef surface as the three
sibling estimators, plus tidy / glance (broom) and
simultaneousCIs. plot and augment are intentionally
not provided: twfeCovs() estimates one pooled effect per cohort, so
it has no per-(cohort, time) / event-study basis to plot, and its
coefficient vector is in a reduced basis that augment()'s
fitted-value path does not match (#58). Both raise an informative error.
This function runs the bridge-penalized extended two-way fixed effects estimator (twfeCovs()) on
simulated data. It is simply a wrapper for twfeCovs(): it accepts an object of class
"FETWFE_simulated" (produced by simulateData()) and unpacks the necessary
components to pass to twfeCovs(). So the outputs match twfeCovs(), and the needed inputs
match their counterparts in twfeCovs().
twfeCovsWithSimulatedData( simulated_obj, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )twfeCovsWithSimulatedData( simulated_obj, verbose = FALSE, alpha = 0.05, add_ridge = FALSE, allow_no_never_treated = TRUE, se_type = "default", ci_type = c("simultaneous", "pointwise") )
simulated_obj |
An object of class |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
allow_no_never_treated |
(Optional.) Logical; if |
se_type |
Character; one of |
ci_type |
Character; one of |
An object of class twfeCovs containing the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
A standard error for the ATT. If |
att_p_value |
A two-sided p-value for the overall ATT against the
null |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
A named vector containing the (asymptotically exact, non-conservative) standard errors for the estimated average treatment effects within each cohort. If the Gram matrix is not invertible, the entries are NA. |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A data frame (with S3 class |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
G |
The final number of treated cohorts that appear in the final data set. |
R |
Deprecated alias
for |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
calc_ses |
Logical indicating whether standard errors were calculated. |
cohort_probs_overall |
A vector of the estimated cohort probabilities on the overall sample (treated and untreated), used in computing the variance of the overall ATT. |
indep_counts_used |
Logical scalar; |
se_type |
Character scalar; the |
alpha |
The alpha level used for confidence intervals. |
ci_type |
Character scalar; the |
y_mean |
Numeric scalar; mean of the original (pre-centering) response.
Stored so downstream methods ( |
response_col_name |
Character scalar; the response column name in
the original |
time_var, unit_var, treatment
|
Character scalars; the corresponding arguments the user passed. |
covs |
Character vector; the original |
internal |
A list containing internal outputs that are typically
not needed for interpretation, packaged here for parity with
|
The returned object is an S3-classed "twfeCovs" list with
print(), summary(), coef(), tidy(),
glance(), and simultaneousCIs() methods, matching the three
sibling estimators. plot() is intentionally not defined — twfeCovs()
estimates one pooled effect per cohort, so there is no per-(cohort, time) /
event-study structure to plot. augment() is intentionally not defined —
the coefficient vector lives in a reduced cohort-level basis that
augment()'s fitted-value path does not match. Both raise an
informative error (#58).
## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- twfeCovsWithSimulatedData(sim_data) ## End(Not run)## Not run: # Generate coefficients coefs <- genCoefs(G = 5, T = 30, d = 12, density = 0.1, eff_size = 2, seed = 123) # Simulate data using the coefficients sim_data <- simulateData(coefs, N = 120, sig_eps_sq = 5, sig_eps_c_sq = 5, seed = 123) result <- twfeCovsWithSimulatedData(sim_data) ## End(Not run)