With time-series data, serial correlation in the error terms is very common. If e_t is positive, e_{t+1} is often positive in the following period. This is called first-order autocorrelation. If this occurs, the default standard errors are no longer reliable.
Sometimes changing the regression specification helps remove the problem. For example:
Using differences x_t-x_{t-1} instead of levels x_t.
Using growth rates \frac{x_t-x_{t-1}}{x_{t-1}} instead of levels x_t.
Adding a time trend term to the model.
In this chapter we will learn how to formally test for first-order autocorrelation and how to correct the standard errors for it.
27.2 Formal Test for First-Order Autocorrelation
We can formally test for first-order autocorrelation as follows.
Estimate the original model: \mathbb{E}\left[ Y_t|x_{t1},\dots,x_{tk} \right]=\beta_0+\beta_1 x_{t1}+\dots+\beta_k x_{tk} and save the residuals, e_t.
Create a new variable which is the lag of the residuals, e_{t-1}.
Estimate the auxiliary model:
e_t = \gamma_0 + \gamma_1e_{t-1}+ \nu_t
Apply the t-test (significance test) on \gamma_1. Under H_0 there is no first-order autocorrelation and under H_1 is there is first-order autocorrelation.
In this auxiliary regression, \gamma_1 is the correlation coefficient between e_t and e_{t-1}. The logic behind the test is that if the previous period’s residual can predict the current period’s one, then the residuals are not independent across time.
27.3 Testing for First-Order Autocorrelation in R
Let’s see how to do these steps in R. We will use the Dutch GDP and exports data we encountered in Chapter 8.
# Step 1: Estimate the original model and save the residuals:df <-read.csv("nl-exports-gdp.csv")m <-lm(gdp ~ exports, data = df)df$e <- m$residuals# Step 2: Create a new variable which is the lag of the residuals:df$lag_e <-c(NA, df$e[1:(nrow(df)-1)])# Step 3: Estimate the auxiliary model:aux <-lm(e ~ lag_e, data = df)# Step 4: Apply an individual significance test on the lagged residual term:summary(aux)
Call:
lm(formula = e ~ lag_e, data = df)
Residuals:
Min 1Q Median 3Q Max
-24.806 -3.847 1.886 5.140 12.424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.91581 1.19398 0.767 0.447
lag_e 0.94605 0.02968 31.878 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.773 on 52 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9513, Adjusted R-squared: 0.9504
F-statistic: 1016 on 1 and 52 DF, p-value: < 2.2e-16
The t-test for the individual significance of the lagged residual has a p-value close to zero. This is very strong evidence for first-order serial correlation.
27.4 Taking Growth Rates
Before learning how to correct the standard errors for serial correlation, let’s first try taking growth rates of both GDP and exports to see if the first-order serial correlation problem goes away. Note that by taking growth rates we lose the first observation because we do not know what the lagged value is in the first period in the data. This is why we need to use the na.omit() function to drop the missing observations.
Call:
lm(formula = e ~ lag_e, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.028405 -0.007534 -0.002025 0.008030 0.032830
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0002411 0.0017667 -0.136 0.892
lag_e 0.2116660 0.1354665 1.562 0.124
Residual standard error: 0.01286 on 51 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.04568, Adjusted R-squared: 0.02697
F-statistic: 2.441 on 1 and 51 DF, p-value: 0.1244
Now the lagged residual has a p-value greater than 0.05. There is no longer evidence of first-order serial correlation.
27.5 Correcting for First-Order Autocorrelation in R
If taking growth rates, differences or adding a trend term does not remove the problem, you can correct the standard errors for serial correlation in a similar way to how we corrected for heteroskedasticity. To do this we use the function vcovHAC(), which corrects for both heteroskedasticity and autocorrelation.
We will now show how to do this in R. Let’s suppose for the moment that our model with growth rates still suffered from serial correlation and we wanted to correct for it.
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0047496 0.0030884 1.5379 0.1301
exports_growth 0.3809872 0.0437884 8.7006 0.00000000001012 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the coefficient estimates are the same as before but the standard errors are slightly different (e.g. 0.0437884 instead of 0.042394 for the slope).