7  SLR Estimation

In this chapter we will learn how to estimate a simple linear regression in R using data. We use the lm() function to do this, where LM stands for Linear Model.

We will show 2 examples:

  1. The advertising and sales example introduced in Chapter 2.
  2. Data on Dutch exports and GDP over time.

7.1 Advertising and Sales Example

To estimate a linear regression with y as the dependent variable and x as the independent variable (with both variables contained in a dataset df), we use the command lm(y ~ x, data = df). Let’s try this out with the advertising and sales data:

df <- read.csv("advertising-sales.csv")
lm(sales ~ advertising, data = df)

Call:
lm(formula = sales ~ advertising, data = df)

Coefficients:
(Intercept)  advertising  
    4.24303      0.04869  

The output shows us the command that was provided (under Call:) and the sample regression coefficients, b_0=4.24303 and b_1=0.04869. The sample regression line is 4.24303 + 0.04869 x.

Let’s interpret these estimates. First let’s remind ourselves of what units the variables are in:

  • Sales is measured in millions of euros.
  • Advertising is measured in thousands of euros.

For the intercept, b_0, recall that it gives an estimate of \mathbb{E}\left[Y_i|x_i=0\right], the expected value of the Y variable when the X variable equals zero. In this example, when the firm does zero advertising, the model predicts that the firm’s sales will be 4.24303 units. Because the units of sales are in millions, this means the expected sales will be €4.24303m.

To see if this is a reliable estimate, we check if we have observations x_i at or near zero:

summary(df$advertising)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   11.7   123.5   207.3   200.9   281.1   433.6 

There are no observations at zero. The smallest value is 11.7 (€11,700), so this estimate is potentially unreliable.

For the slope, b_1, recall that it gives an estimate of: \mathbb{E}\left[Y_i|x_i+1\right]- \mathbb{E}\left[Y_i|x_i\right] which is the expected change in units of the Y variable when the X variable increases by 1 unit. Increasing the X by 1 unit corresponds to an increasing in advertising by €1,000. So when advertising increases by €1,000, sales on average increases by 0.04869\times\text{€1,000,000}=\text{€48,690}.

7.2 Netherlands Exports and GDP

This advertising-sales dataset is an example of cross-sectional data. Cross-sectional data involve observations from different individuals or firms surveyed at the same point in time. We will now consider an example with time-series data. Time-series data involve observations from the same individual or firm at different points in time.

The example will we consider uses the dataset nl-exports-gdp.csv which contains two variables measured over 1969-2023:

  1. Netherlands GDP (measured in billions of USD).
  2. Netherlands total exports of goods and services (measured in billions of USD).

We know from how GDP is calculated that if exports increase by $1bn and nothing else changes, then GDP should also increase by $1bn. Let’s check if this is true in the data by estimating the regression model:

GDP_i=\beta_0 + \beta_1 Exports_i + \varepsilon_i

df <- read.csv("nl-exports-gdp.csv")
lm(gdp ~ exports, data = df)

Call:
lm(formula = gdp ~ exports, data = df)

Coefficients:
(Intercept)      exports  
   287.6114       0.8224  

The intercept b_0=287.6114 gives an estimate of the value of GDP (in billions) when exports are zero. So the model predicts that Dutch GDP would be $287.61bn if it exported zero goods.

Having zero exports is a very strange concept for an open economy like the Netherlands. Let’s see if any observations of the X variable are at or near zero:

summary(df$exports)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  62.98  122.05  257.93  332.33  505.88  786.90 

The smallest value is $62.98bn, very far from zero. Therefore we should not trust the estimate of the intercept.

The slope, b_1=0.8224 tells us that if exports increase by 1 unit ($1bn), on average GDP increases by 0.8224 units ($822.4m). This is somewhat less than what we expect to see. The reason it is a bit less is because other things are changing at the same time that also affect exports and GDP.