install.packages("WDI")
Tutorial Exercises Week 3
Question 1
Install the R package "WDI"
. This is the World Bank Development Indicators R package. It is a way to access the data on https://data.worldbank.org/ conveniently within R.
Search for the name of indicator which gives the annual inflation in consumer prices (in %). Do this by using the WDIsearch()
function to search for “inflation”.
Once you have the name of the indicator, use the WDI()
function to get the annual inflation rate in the Netherlands. Use the following four arguments in the WDI() function:
indicator =
[the indicator you found in the first step (surrounded by quotes)]country = "NLD"
start = 1960
end = 2022
What was the inflation rate in the Netherlands in 2022 according to these data?
Write your answer with 3 digits after the decimal
Solution
We can install the package with the install.packages()
function:
We then load the package and search for inflation indicators:
library(WDI)
WDIsearch("inflation")
indicator name
7442 FP.CPI.TOTL.ZG Inflation, consumer prices (annual %)
7444 FP.FPI.TOTL.ZG Inflation, food prices (annual %)
7446 FP.WPI.TOTL.ZG Inflation, wholesale prices (annual %)
11393 NY.GDP.DEFL.87.ZG Inflation, GDP deflator (annual %)
11394 NY.GDP.DEFL.KD.ZG Inflation, GDP deflator (annual %)
11395 NY.GDP.DEFL.KD.ZG.AD Inflation, GDP deflator: linked series (annual %)
We can see that the inflation in consumer prices indicator is called "FP.CPI.TOTL.ZG"
. We then use the WDI()
function to download the values for the "FP.CPI.TOTL.ZG"
indicator for the Netherlands between 1960-2022:
<- WDI(indicator = "FP.CPI.TOTL.ZG", country = "NLD",
df start = 1960, end = 2022)
We then subset the data for year 2022 to see the value of inflation in that year:
$year == 2022, ] df[df
country iso2c iso3c year FP.CPI.TOTL.ZG
1 Netherlands NL NLD 2022 10.00121
Question 2
What was the average annual inflation rate in consumer prices (in %) over the period 2000-2019?
Write your answer with 3 digits after the decimal.
Solution
To do this we can subset the data to years at least as large as 2000 and at most 2019:
$year >= 2000 & df$year <= 2019, ] df[df
country iso2c iso3c year FP.CPI.TOTL.ZG
4 Netherlands NL NLD 2019 2.6336991
5 Netherlands NL NLD 2018 1.7034979
6 Netherlands NL NLD 2017 1.3814587
7 Netherlands NL NLD 2016 0.3166667
8 Netherlands NL NLD 2015 0.6002481
9 Netherlands NL NLD 2014 0.9760351
10 Netherlands NL NLD 2013 2.5068985
11 Netherlands NL NLD 2012 2.4555477
12 Netherlands NL NLD 2011 2.3410702
13 Netherlands NL NLD 2010 1.2753057
14 Netherlands NL NLD 2009 1.1897769
15 Netherlands NL NLD 2008 2.4865020
16 Netherlands NL NLD 2007 1.6138586
17 Netherlands NL NLD 2006 1.1015011
18 Netherlands NL NLD 2005 1.6881302
19 Netherlands NL NLD 2004 1.2636474
20 Netherlands NL NLD 2003 2.0919984
21 Netherlands NL NLD 2002 3.2875310
22 Netherlands NL NLD 2001 4.1558413
23 Netherlands NL NLD 2000 2.3605223
To get the average of inflation in those years we can take the mean of the inflation indicator in the subsetted data:
mean(df[df$year >= 2000 & df$year <= 2019, ]$FP.CPI.TOTL.ZG)
[1] 1.871487
Question 3
What was the median annual inflation rate in consumer prices (in %) in years that end in a zero between 1960-2020 (so the years 1960, 1970, 1980, 1990, 2000, 2010, 2020)?
Write your answer with 3 digits after the decimal.
Solution
To get the years that end in a zero we can create a sequence in steps of 10:
seq(1960, 2020, by = 10)
[1] 1960 1970 1980 1990 2000 2010 2020
We can subset the data using the %in%
operator with those years. For each year in df$year
, we check if it matches any of the values in seq(1960, 2020, by = 10)
:
$year %in% seq(1960, 2020, by = 10), ] df[df
country iso2c iso3c year FP.CPI.TOTL.ZG
3 Netherlands NL NLD 2020 1.272460
13 Netherlands NL NLD 2010 1.275306
23 Netherlands NL NLD 2000 2.360522
33 Netherlands NL NLD 1990 2.454092
43 Netherlands NL NLD 1980 6.513455
53 Netherlands NL NLD 1970 3.668931
63 Netherlands NL NLD 1960 2.323944
To get the median inflation in those years, we use the median()
function:
median(df[df$year %in% seq(1960, 2020, by = 10), ]$FP.CPI.TOTL.ZG)
[1] 2.360522
An alternative way we could have done this question is by using the modulo operator. The modulo operator gives the remainder when we divide two numbers. For example, if we divide 5 by 2, it’s 2 with a remainder of 1:
5 %% 2
[1] 1
If we divide 19 by 10, we get 1 with a remainder of 9:
19 %% 10
[1] 9
If we divide 20 by 10, we get 2 with a remainder of 0:
20 %% 10
[1] 0
The years that end in a zero all have a remainder of 0 when we divide by 10, so we could use this to find the rows of the data with years that end in a zero:
$year %% 10 == 0, ] df[df
country iso2c iso3c year FP.CPI.TOTL.ZG
3 Netherlands NL NLD 2020 1.272460
13 Netherlands NL NLD 2010 1.275306
23 Netherlands NL NLD 2000 2.360522
33 Netherlands NL NLD 1990 2.454092
43 Netherlands NL NLD 1980 6.513455
53 Netherlands NL NLD 1970 3.668931
63 Netherlands NL NLD 1960 2.323944
Question 4
The WDI()
function returns a dataframe with the most-recent year first. You want to sort the data so that the years are ascending.
Which of the following commands achieves that goal?
df <- df[order(df$year), ]
df <- df[order(df$year, decreasing = TRUE), ]
df <- df[sort(df$year), ]
df <- df[sort(df$year, ascending = TRUE), ]
Also sort your data by year for the remaining questions.
Solution
We use the order()
function to sort data. The sort()
sorts the values provided to it, and doesn’t provide the indices of the ordered items, which is what we need to sort the data.
We want to sort ascending, which is with decreasing = FALSE
. This is the default in the order()
function, so we can use the following to sort the data:
<- df[order(df$year), ] df
Question 5
Create a vector called inflation_change
which is the change in inflation from the previous year:
- The first element should be the difference in the inflation rate between 1961 and 1960 (inflation 1961 minus inflation 1960).
- The second element should be the difference in the inflation rate between 1962 and 1961.
- …
- The last element should be difference in the inflation rate between 2022 and 2021.
This vector should have 62 elements. We don’t have a change for 1960 because we don’t observe the inflation rate in 1959.
What is the value of the 5th element in your vector? Report your answer with 3 numbers after the decimal.
Solution
The first element should be the difference between 1961 and 1960. This is:
$FP.CPI.TOTL.ZG[2] - df$FP.CPI.TOTL.ZG[1] df
[1] -1.016304
because 1960 is the 1st element and 1961 is the 2nd.
The second element should be the difference between 1962 and 1961. This is:
$FP.CPI.TOTL.ZG[3] - df$FP.CPI.TOTL.ZG[2] df
[1] 1.123483
because 1961 is the 2nd element and 1962 is the 3rd. The last element should be the difference between 2022 and 2021. This is:
$FP.CPI.TOTL.ZG[63] - df$FP.CPI.TOTL.ZG[62] df
[1] 7.325488
because 2021 is the 62nd element and 2022 is the 63rd.
Seeing the pattern here, we can get all the elements together using:
<- df$FP.CPI.TOTL.ZG[2:63] - df$FP.CPI.TOTL.ZG[1:62] inflation_change
We can check that this has 62 elements:
length(inflation_change)
[1] 62
We need to provide the 5th element to answer the question. This is:
5] inflation_change[
[1] -1.91573
Question 6
You want to add this inflation_change
vector as a variable to your dataframe which is sorted ascending by year.
However, because your vector has only 62 elements (because we do not know the value of inflation in 1959), we cannot add it directly. Instead, we replace the value for the change of inflation in 1960 with a missing value indicator NA
.
Which of the following commands will add the inflation_change variable correctly in your data?
df$inflation_change <- c(inflation_change, NA)
df$inflation_change <- c(NA, inflation_change)
df$inflation_change <- inflation_change
Solution
We don’t know what the inflation change is for 1960 because we don’t know what inflation was in 1959. Therefore we need to set the value for this year to missing if we add this vector to our dataframe. We add an NA
to the start (the value in 1960) and then include the change for all the other years:
$inflation_change <- c(NA, inflation_change) df