28 The Zero Conditional Mean Assumption

28.1 Introduction

A crucial assumption in the linear regression model is that \mathbb{E}\left[ \varepsilon_i|x_{i1},\dots,x_{ik} \right]=0. This assumption implies no correlation between the error term and the explanatory variables. A violation of this assumption means our estimates of \beta_{j} are either too big or too small, sometimes even turning the opposite sign! Recall the class size and test scores example we saw in Chapter 8 where we saw that a regression of test scores on class size can yield a positive coefficient estimate on class size, even though we expect a negative one. Naturally this is much more serious than having standard errors that are too small.

A common remedy to this problem is to add more explanatory variables to the model that we suspect are correlated with our X variables of interest and the outcome variable Y. For example, adding the average socioeconomic status of the students to the class size and test scores model.

In this chapter we will briefly discuss some other solutions to the problem. At the very end of this chapter we will also have a brief discussion on some other model assumptions.

28.2 Experiments and Natural Experiments

Often adding more explanatory variables does not solve the problem. This is usually because there are variables which we would like to include but we do not have data on them (they are unobserved).

One way to solve this is to run an experiment: If we change X for individuals randomly and observe their outcomes Y, the randomness guarantees no correlation with the error. For example, we could randomly put students into classes of different sizes and observe their test scores afterwards.

But often we can’t run an experiment because it’s too expensive or unethical. For example, if we want to know the effect of a college degree on future wages, it would be unethical to stop people who would otherwise have went to college from obtaining a degree just to see how much less income they would make.

When an experiment is too expensive or unethical, sometimes we can use a “natural experiment”. This is when there is an institutional feature that generates randomness in a variable. Returning to the class size and test scores example. In Israel, you have to go to a particular school based on where you live. There are strict rules that determine the number of classrooms in a school district:

If there are 40 students to be enrolled, there is only 1 classroom.
If there are 41 students to be enrolled, they are split into 2 classrooms (one with 20 and one 21 students).

Having 40 versus 41 students enrolled in a year is effectively random. Therefore if we compare test scores only between schools with 40 students (big classrooms) and 41 students (small classrooms), we can get the causal effect of classroom size on test scores.

Here is another example of a natural experiment. Suppose we want to estimate the effect of attending an elite secondary school (a dummy variable X) on future earnings (Y): Y_i=\beta_0 + \beta_1 X_i + \varepsilon_i Often people who attend these schools are very able and productive. But able and productive people can find better jobs, regardless of where they go to school. So the X variable is correlated with the error term.

Ability/productivity is a difficult variable to measure precisely, so we can’t add it to our model. It would also be both prohibitively expensive and unethical to randomly force some people to attend an elite school and others not.

So we rely on the natural experiment approach. We can make use of the fact that some elite schools have an entrance exam where you can enter if you achieve a minimum score on the exam. Students that just barely passed and just barely failed scored very similar on the exam, and on average should be similar to each other. Just some people were lucky and just about passed, while others had some bad luck and scored just below the required grade. By comparing future earnings only between those just above and just below the passing grade, we can get the causal effect.

28.3 Other Model Assumptions

We end this chapter with a very brief discussion of the other model assumptions and possible remedies for violations.

28.3.1 Non-Linearities or Non-Normal Error Terms

We will not discuss a formal test for these. If based on an analysis of scatter plots you suspect a violation of either of these, a change in the model specification can help. For example:

Taking the natural logarithm of either the Y variable, the X variable, or both.
Transforming levels of a variable X_t to either:
- Changes: X_t-X_{t-1}.
- Growth rates \left( X_t-X_{t-1} \right)/X_{t-1}.
Add higher-order terms (such as X^2) to the model.

28.3.2 Perfect Collinearity

We won’t discuss a formal test for this because R automatically “drops” variables that suffer from it. We will know immediately if it is present in our model. The remedy is simple: we just have to drop the offending variables from the model.