Download the trade data

Download the data from the website using:

trade <- read.csv("https://rtgodwin.com/data/trade2.csv")

Pooled LS

Ignore the panel structure and estimate the model:

mod <- lm(log(exports) ~ log(gdp) + log(distance), data = trade)
summary(mod)
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    5.74166    0.41817   13.73   <2e-16 ***
log(gdp)       0.83199    0.01115   74.63   <2e-16 ***
log(distance) -1.15155    0.04733  -24.33   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2753 on 189 degrees of freedom
Multiple R-squared:  0.9794,	Adjusted R-squared:  0.9792 
F-statistic:  4499 on 2 and 189 DF,  p-value: < 2.2e-16

A 1% increase in GDP of the importing province/territory is associated with a 0.83% increase in Manitoba’s exports to that location. For every 1% increase in distance, exports decline by 1.15%.

Fixed effects estimation (manual)

fe.dummies <- lm(log(exports) ~ log(gdp) + partner - 1, data = trade)
summary(fe.dummies)

Here, we manually put in the dummy variable partner' (the variable takes on 1 of 12 provinces/territories as possible trading partners). R creates 12 dummies. The -1` gets rid of the intercept $\bets_0$ in order to avoid the dummy variable trap.

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    5.74166    0.41817   13.73   <2e-16 ***
log(gdp)       0.83199    0.01115   74.63   <2e-16 ***
log(distance) -1.15155    0.04733  -24.33   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2753 on 189 degrees of freedom
Multiple R-squared:  0.9794,	Adjusted R-squared:  0.9792 
F-statistic:  4499 on 2 and 189 DF,  p-value: < 2.2e-16

IV estimation using ivreg()

To estimate the model:

\[wage = \beta_0 + \beta_1education + \beta_2urban + \beta_3gender + \beta_4ethnicity + \beta_5unemp + \epsilon\]

using IV estimation, and where distance is an instrument for education, we can use:

install.packages("ivreg")
library(ivreg)
iv <- ivreg(wage ~ education + urban + gender + ethnicity + unemp |
    distance + urban + gender + ethnicity + unemp, data = college)
summary(iv)
Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -0.65702    1.83641  -0.358   0.7205    
education          0.64710    0.13594   4.760 1.99e-06 ***
urbanyes           0.04614    0.06039   0.764   0.4449    
gendermale         0.07075    0.04997   1.416   0.1569    
ethnicityhispanic -0.12405    0.08871  -1.398   0.1621    
ethnicityother     0.22724    0.09863   2.304   0.0213 *  
unemp              0.13916    0.00912  15.259  < 2e-16 ***

IV estimation using 2SLS approach

In the first stage we get the LS predicted values from a regression of the endogenous variable education on the instrument, and all other $x$ variables:

first.stage <- lm(education ~ urban + gender + ethnicity + unemp + distance,
    data = college)
education.hat <- first.stage$fitted.values

In the second stage we estimate the original population model, but we replace the variable distance with the predicted values from the first stage:

iv <- lm(wage ~ education.hat + urban + gender + ethnicity + unemp,
data = college)
summary(iv)

The results are the same as from the ivreg() package!