class: center, middle, inverse, title-slide # New DD Estimators ##
### Ian McCarthy | Emory University ### Workshop on Causal Inference with Panel Data --- <!-- Adjust some CSS code for font size and maintain R code font size --> <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 2em 1em 2em; } .remark-code { font-size: 15px; } .remark-inline-code { font-size: 20px; } </style> <!-- Set R options for how code chunks are displayed and load packages --> # Table of contents 1. [The Issue](#problem) 2. [Callaway and Sant'Anna](#cs) 3. [Sun and Abraham](#sa) 4. [de Chaisemartin and D'Haultfoeuille (2020)](#ch) 5. [Even more!](#more) 6. [All together now](#together) --- class: inverse, center, middle name: problem # Revisiting the Issue <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Problem with TWFE Recall the biggest issues with "standard" TWFE estimates: - Best case: Variance-weighted ATT - Biased with heterogeneous effects over time and differential timing - Differential timing **alone** can introduce bias because already treated act as controls for later treated groups -- "Heterogeneous" treatment effects should be the baseline --- # Solution Only consider "clean" comparisons: -- - Separate event study for each treatment group vs never-treated - Callaway and Sant'Anna (2020) - Sun and Abraham (2020) - de Chaisemartin and D'Haultfoeuille (2020) - Cengiz et al. (2019), Gardner (2021), and Borusyak et al. (2021) --- class: inverse, center, middle name: cs # Callaway and Sant'Anna <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # CS Estimator - "Manually" estimate group-specific treatment effects for each period - Each estimate is propensity-score weighted - Aggregate the treatment effect estimates (by time, group, or both) --- # CS in Practice .pull-left[ **Stata**<br> ```stata ssc install csdid ssc install event_plot ssc install drdid insheet using "https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt", clear gen perc_unins=uninsured/adult_pop egen stategroup=group(state) drop if expand_ever=="NA" replace expand_year="0" if expand_year=="NA" destring expand_year, replace csdid perc_unins, ivar(stategroup) time(year) gvar(expand_year) notyet estat event, estore(cs) event_plot cs, default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") xlabel(-6(1)4) title("Callaway and Sant'Anna (2020)")) stub_lag(T+#) stub_lead(T-#) together ``` ] .pull-left[ **R**<br> ```r library(tidyverse) library(did) library(DRDID) mcaid.data <- read_tsv("https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt") reg.dat <- mcaid.data %>% filter(!is.na(expand_ever)) %>% mutate(perc_unins=uninsured/adult_pop, post = (year>=2014), treat=post*expand_ever, expand_year=ifelse(is.na(expand_year),0,expand_year)) %>% filter(!is.na(perc_unins)) %>% group_by(State) %>% mutate(stategroup=cur_group_id()) %>% ungroup() mod.cs <- att_gt(yname="perc_unins", tname="year", idname="stategroup", gname="expand_year", data=reg.dat, panel=TRUE, est_method="dr", allow_unbalanced_panel=TRUE) mod.cs.event <- aggte(mod.cs, type="dynamic") ggdid(mod.cs.event) ``` ] --- class: inverse, center, middle name: sa # Sun and Abraham <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Sun and Abraham Considers event study with differential treatment timing: - Problem: lead and lag coefficient estimates are potentially biased due to treatment/control group construction - Solution: Estimate fully interacted model -- `$$y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + x_{it} + \epsilon_{it}$$` --- count: false # Sun and Abraham `$$y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + x_{it} + \epsilon_{it}$$` -- - `\(g\)` denotes a group and `\(C_{g}\)` the set of individuals in group `\(g\)` - `\(\tau\)` denotes time periods - `\(D_{it}^{\tau}\)` denotes a relative time indicator --- count: false # Sun and Abraham `$$y_{it} = \gamma_{i} + \gamma_{t} + \sum_{g} \sum_{\tau \neq -1} \delta_{g \tau} \times \text{1}(i \in C_{g}) \times D_{it}^{\tau} + x_{it} + \epsilon_{it}$$` -- - Intuition: Standard regression with different event study specifications for each treatment group - Aggregate `\(\delta_{g\tau}\)` for standard event study coefficients and overall ATT --- # Sun and Abraham in Practice .pull-left[ **Stata**<br> ```stata ssc install eventstudyinteract ssc install avar ssc install event_plot insheet using "https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt", clear gen perc_unins=uninsured/adult_pop drop if expand_ever=="NA" egen stategroup=group(state) replace expand_year="." if expand_year=="NA" destring expand_year, replace gen event_time=year-expand_year gen nevertreated=(event_time==.) forvalues l = 0/4 { gen L`l'event = (event_time==`l') } forvalues l = 1/2 { gen F`l'event = (event_time==-`l') } gen F3event=(event_time<=-3) eventstudyinteract perc_unins F3event F2event L0event L1event L2event L3event L4event, vce(cluster stategroup) absorb(stategroup year) cohort(expand_year) control_cohort(nevertreated) event_plot e(b_iw)#e(V_iw), default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") xlabel(-3(1)4) title("Sun and Abraham (2020)")) stub_lag(L#event) stub_lead(F#event) plottype(scatter) ciplottype(rcap) together ``` ] .pull-left[ **R**<br> ```r library(tidyverse) library(modelsummary) library(fixest) mcaid.data <- read_tsv("https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt") reg.dat <- mcaid.data %>% filter(!is.na(expand_ever)) %>% mutate(perc_unins=uninsured/adult_pop, post = (year>=2014), treat=post*expand_ever, expand_year = ifelse(expand_ever==FALSE, 10000, expand_year), time_to_treat = ifelse(expand_ever==FALSE, -1, year-expand_year), time_to_treat = ifelse(time_to_treat < -3, -3, time_to_treat)) mod.sa <- feols(perc_unins~sunab(expand_year, time_to_treat) | State + year, cluster=~State, data=reg.dat) iplot(mod.sa, xlab = 'Time to treatment', main = 'Event study') ``` ] --- class: inverse, center, middle name: ch # de Chaisemartin and D'Haultfoeuille (CH) <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # CH - More general than other approaches - Considers "fuzzy" treatment (i.e., non-discrete treatment) - Considers fixed effects and first-differencing -- New paper from Callaway, Goodman-Bacon, and Sant'Anna also looks at DD with continuous treatment --- # CH Approach - Essentially a series of 2x2 comparisons - Aggregates up to overall effects --- # CH in Practice .pull-left[ **Stata**<br> ```stata ssc install did_multiplegt ssc install event_plot insheet using "https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt", clear gen perc_unins=uninsured/adult_pop drop if expand_ever=="NA" egen stategroup=group(state) replace expand_year="." if expand_year=="NA" destring expand_year, replace gen event_time=year-expand_year gen nevertreated=(event_time==.) gen treat=(event_time>=0 & event_time!=.) did_multiplegt perc_unins stategroup year treat, robust_dynamic dynamic(4) placebo(3) breps(100) cluster(stategroup) event_plot e(estimates)#e(variances), default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") /// title("de Chaisemartin and D'Haultfoeuille (2020)") xlabel(-3(1)4)) stub_lag(Effect_#) stub_lead(Placebo_#) together ``` ] .pull-left[ **R**(not the same as in **Stata**)<br> ```r library(DIDmultiplegt) mcaid.data <- read_tsv("https://raw.githubusercontent.com/imccart/202108-cdc-methodsworkshop/master/data/medicaid-expansion/mcaid-expand-data.txt") reg.dat <- as.data.table(mcaid.data) %>% filter(!is.na(expand_ever)) %>% mutate(perc_unins=uninsured/adult_pop, treat=case_when( expand_ever==FALSE ~ 0, expand_ever==TRUE & expand_year<year ~ 0, expand_ever==TRUE & expand_year>=year ~ 1)) mod.ch <- did_multiplegt(df=reg.dat, Y="perc_unins", G="State", T="year", D="treat", placebo=3, dynamic=4, brep=50, cluster="State") ``` ] --- class: inverse, center, middle name: others # And even more! <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Cengiz et al. (2019) - "Stacked" event studies - Estimate event study for every treatment group, using never-treated as controls - Aggregate to overall average effects -- .pull-left[ **Stata**<br> `stackdev` ] .pull-right[ **R**<br> `#nothing yet` ] --- # Gardner (2021) - "Remove" fixed effects via first stage regression only among non-treated units - Predict FE from first stage and residualize the outcome - Run standard event study specification on residualized outcome variable -- .pull-left[ **Stata**<br> `did2s` ] .pull-right[ **R**<br> `did2s` ] --- # Borusyak et al. (2021) - Imputation approach - Estimate regression only for untreated observations - Predicted untreated outcome among the treated observations and take the difference - Aggregate differences to form overall weighted average effect -- .pull-left[ **Stata**<br> `did_imputation` ] .pull-right[ **R**<br> `did2s` ] --- class: inverse, center, middle name: together # Putting things together <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Seems like lots of "solutions" - Callaway and Sant'Anna (2020) - Sun and Abraham (2020) - de Chaisemartin and D'Haultfoeuille (2020) - Cengiz et al (2019), Gardner (2021), and Borusyak et al. (2021) -- Goodman-Bacon (2021) explores the problems but doesn't really propose a solution (still very important work though!) --- # Comparison .pull-left[ **Similarities**<br> - Focus on clean treatment/control - Focus on event study framework (not a single overall effect) - Impose some form of parallel trends assumption ] .pull-right[ **Differences**<br> - What are the control units? - How to include covariates? ] --- # State of current work - Careful consideration of treatment timing and control group(s) - `panelView` package is great here! - Implement 2 or more approaches