class: center, middle, inverse, title-slide # Basics of Panel Data ##
### Ian McCarthy | Emory University ### Workshop on Causal Inference with Panel Data --- <!-- Adjust some CSS code for font size and maintain R code font size --> <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 2em 1em 2em; } .remark-code { font-size: 15px; } .remark-inline-code { font-size: 20px; } </style> <!-- Set R options for how code chunks are displayed and load packages --> # Table of contents 1. [What are Panel Data](#panel) 2. [Estimation](#estimation) 3. [In Practice](#handson) 4. [Semantics](#semantics) --- class: inverse, center, middle name: panel # What are Panel Data? <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Nature of the Data - Repeated observations of the same units over time -- **Notation** - Unit `\(i=1,...,N\)` over several periods `\(t=1,...,T\)`, which we denote `\(y_{it}\)` - Treatment status `\(D_{it}\)` - Regression model, <br> `\(y_{it} = \delta D_{it} + u_{i} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` --- # Benefits of Panel Data - *May* overcome certain forms of omitted variable bias - Allows for unobserved but time-invariant factor, `\(u_{i}\)`, that affects both treatment and outcomes -- **Still assumes** - No time-varying confounders - Past outcomes do not directly affect current outcomes - Past outcomes do not affect treatment (reverse causality) --- # Some textbook settings - Unobserved "ability" when studying schooling and wages - Unobserved "quality" when studying physicians or hospitals --- class: inverse, center, middle name: estimation # Estimating Regressions with Panel Data <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Regression model `\(y_{it} = \delta D_{it} + u_{i} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` --- # Fixed Effects `\(y_{it} = \delta D_{it} + u_{i} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Allows correlation between `\(u_{i}\)` and `\(D_{it}\)` - Physically estimate `\(u_{i}\)` in some cases via set of dummy variables - More generally, "remove" `\(u_{i}\)` via: - "within" estimator - first-difference estimator --- # Within Estimator `\(y_{it} = \delta D_{it} + u_{i} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Most common approach (default in most statistical software) - Equivalent to demeaned model,<br> `\(y_{it} - \bar{y}_{i} = \delta (D_{it} - \bar{D}_{i}) + (u_{i} - \bar{u}_{i}) + (\epsilon_{it} - \bar{\epsilon}_{i})\)` - `\(u_{i} - \bar{u}_{i} = 0\)` since `\(u_{i}\)` is time-invariant - Requires *strict exogeneity* assumption (error is uncorrelated with `\(D_{it}\)` for all time periods) --- # First-difference `\(y_{it} = \delta D_{it} + u_{i} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Instead of subtracting the mean, subtract the prior period values<br> `\(y_{it} - y_{i,t-1} = \delta(D_{it} - D_{i,t-1}) + (u_{i} - u_{i}) + (\epsilon_{it} - \epsilon_{i,t-1})\)` - Requires exogeneity of `\(\epsilon_{it}\)` and `\(D_{it}\)` only for time `\(t\)` and `\(t-1\)` (weaker assumption than within estimator) - Sometimes useful to estimate both FE and FD just as a check --- # Keep in mind... - Discussion only applies to linear case or very specific nonlinear models - Fixed effects can't solve reverse causality - Fixed effects doesn't address unobserved, time-varying confounders - Can't estimate effects on time-invariant variables - May "absorb" a lot of the variation for variables that don't change much over time --- class: inverse, center, middle name: handson # Seeing things in action <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Within Estimator (Default) .pull-left[ **Stata**<br> ```stata ssc install bcuse bcuse wagepan tsset nr year xtreg lwage exper expersq, fe ``` ] .pull-right[ **R**<br> ```r library(readstata13) library(fixest) wagepan <- read.dta13("http://fmwww.bc.edu/ec-p/data/wooldridge/wagepan.dta") feols(lwage~exper + expersq | nr, data=wagepan) ``` ] --- # Within Estimator (Manually Demean) .pull-left[ **Stata**<br> ```stata ssc install bcuse bcuse wagepan foreach x of varlist lwage exper expersq { egen mean_`x'=mean(`x') egen demean_`x'=`x'-mean_`x' } reg demean_lwage demean_exper demean_expersq ``` ] .pull-right[ **R**<br> ```r library(readstata13) wagepan <- read.dta13("http://fmwww.bc.edu/ec-p/data/wooldridge/wagepan.dta") wagepan <- wagepan %>% group_by(nr) %>% mutate(demean_lwage=lwage - mean(lwage), demean_exper=exper - mean(exper), demean_expersq=expersq - mean(expersq)) summary(lm(demean_lwage~demean_exper + demean_expersq, data=wagepan)) ``` ] --- # First differencing .pull-left[ **Stata**<br> ```stata ssc install bcuse bcuse wagepan reg d.lwage d.exper d.expersq, noconstant ``` ] .pull-right[ **R**<br> ```r library(readstata13) wagepan <- read.dta13("http://fmwww.bc.edu/ec-p/data/wooldridge/wagepan.dta") wagepan <- wagepan %>% group_by(nr) %>% arrange(year) %>% mutate(fd_lwage=lwage - lag(lwage), fd_exper=exper - lag(exper), fd_expersq=expersq - lag(expersq)) %>% na.omit() summary(lm(fd_lwage~0 + fd_exper + fd_expersq, data=wagepan)) ``` ]