Tutorial Exercises Week 7
Question 1
Download the two datasets:
Read in both datasets. When reading in the house price dataset you should use the following command:
read.csv("cpb-house-prices.csv", sep = ";", dec = ",")
This is because the dataset uses semicolons to separate the columns instead of commas, and uses commas for decimals.
Rename the 3 variables to: "municipality", "house_price_2022", "house_price_2021"
.
The 2nd dataset is can be read in with the read.csv()
function without any special options. Rename the 2 variables in that dataset to: "municipality", "pop_growth_2018_2023"
.
Merge the two datasets together by the variable "municipality"
.
One municipality from the population growth dataset fails to merge with the house price dataset. Which municipality is this?
Question 2
How many municipalities from the house price dataset fail to merge with the population growth dataset?
Question 3
Create a scatter plot using ggplot of population growth on the horizontal axis and the house price in 2022 on the vertical axis.
Add the following layer to your plot to get a fitted line through the points:
geom_smooth(method = "lm")
Choose the answer below which best interprets what we can see in the plot.
- Municipalities with higher population growth on average have higher house prices.
- Municipalities with higher population growth on average have lower house prices.
Question 4
Reshape the original house price dataset from wide format to long format using the municipality as the ID variable. How many rows does the long format dataset have?
Question 5
If you correctly reshaped the dataset from the previous question the first 4 rows should look like:
municipality variable value
1 Bloemendaal house_price_2022 1118.9
2 Blaricum house_price_2022 1099.1
3 Laren (NH.) house_price_2022 1030.1
4 Wassenaar house_price_2022 970.8
Suppose the long format dataset is called df1_long
. Which of the following commands will return the dataset back to its original format (apart from the order of the observations)?
dcast(df1_long, municipality ~ variable)
dcast(df1_long, variable ~ municipality)
dcast(df1_long, value ~ municipality)
dcast(df1_long, municipality ~ value)
Question 6
Download the dataset municipality-province.csv.
This dataset contains two variables: the municipality and the province in which each municipality is located.
Read in the dataset and rename the variables to "municipality", "province"
.
Merge the municipality-province.csv
dataset with your previously-merged house price and population growth dataset.
Calculate the average of the variable house_price_2022
by province.
Which province has the highest average?