1. Download any data set that contains at least 3 variables.


    I use a fake dataset that I created:

    mydata <- read.csv("https://rtgodwin.com/data/monopolist.csv")
    
  2. In R, use LS to estimate a population model of the form: \(\boldsymbol{y} = X\boldsymbol{\beta} + \boldsymbol{\epsilon} \nonumber\)

    On the right hand side of your model, only include 2 regressors and the intercept (like in the Cobb-Douglas example).


    I estimate the model using:

    mod <- lm(quantity ~ price + bad.weather, data = mydata)
    
  3. Verify that the $x$ variables are orthogonal to the LS residuals.


    I save and extract the residuals from the estimated model:

    residuals <- mod$residuals
    

    The question asks to show that $X^{\prime} e = 0$. To transpose a vector in R, we can use t(), and %*% multiplies matrices (or vectors):

    t(residuals) %*% mydata$price
    
    >>               [,1]
    >> [1,] -4.085621e-14
    

    -4.085621e-14 is 0.000000000000004085621. This is not quite zero due to rounding. We can also verify that the other variable is orthogonal as well:

    t(residuals) %*% mydata$bad.weather
    
    >               [,1]
    > [1,] -1.498801e-15
    

    You can also calculate $X^{\prime} e$ by taking the “inner product”: sum(residuals * mydata$price).

    Many students attempted to show orthogonality in different ways, which is fine, as long as the “others ways” are equivalent to orthogonality. For example, you can check the correlation between the $X$ variables and the residuals and show that it is “close” to zero, because when one of the variables has mean zero (in this case the residuals), then correlation and orthogonality are equivalent. So, using cor(residuals, mydata$price) works as well.


  4. Verify that the LS residuals from your estimated model sum to zero.


    Sum the residuals and see if they are “close” to zero (“close” due to rounding):

    sum(residuals)
    
    > [1] -1.991463e-15
    
  5. Verify that the regression line (it is actually a 2-dimensional “plane”) passes through the sample mean of the data.


    We need to evaluate the fitted model at the sample means of the data. That is, we get a LS prediction by “plugging” in the means for the $X$ variables into the estimated equation. That predicted value is equal to the mean of the $y$ variable:

    \[\hat{quantity} = b_0 + b_1 \times \bar{price} + b_2 \times \bar{bad.weather} = \bar{quantity}\]

    There are a few ways to accomplish this in R, but here is one:

    mod$coefficients %*% c(1, mean(mydata$price), mean(mydata$bad.weather))
    
            [,1]
    [1,] 14.0794
    
    mean(mydata$quantity)
    
    [1] 14.0794
    
  6. Verify that the fitted values and residuals are invariant to a non-singular linear transformation.


    Multiply any variable by a constant, or add any constant to a variable, and check to see that the residuals and predictions remain the same:

    mydata$price.cents <- 100 * mydata$price
    mod2 <- lm(quantity ~ price.cents + bad.weather, data = mydata)
    sum(mod2$residuals - mod$residuals)
    
    [1] -4.527628e-15
    
    sum(mod2$fitted.values - mod$fitted.values)
    
    [1] 1.776357e-15
    
  7. Use the Frisch-Waugh-Lovell theorem and partial regression to get the LS estimate for just one of the $\beta$.

    Recall that the FWL theorem suggests that, for the model:

    \[y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \epsilon \nonumber\]

    the LS estimator for $\beta_2$ (for example) can be obtained by:

    1. Regressing $x_2$ on $x_1$ and the constant, saving the residuals.

    2. Regressing $y$ on $x_1$ and the constant, saving the residuals.

    3. Regressing the residuals from (ii) onto (i), without a constant.


For the R code required for this assignment, click on this tutorial.