2^4 + 10) / (8 * sqrt(4)) (
[1] 1.625
Calculate the following using R:
\[ \frac{2^4 + 10}{8\times \sqrt{4}} \]
2^4 + 10) / (8 * sqrt(4)) (
[1] 1.625
Calculate \(\log_2\left(64\right)\) using R.
log(64, base = 2)
[1] 6
If we create the following vector in R, what class will it be?
c(1, 2, "c", "d")
Note: You do not need to supply your code for this question.
<- c(1, 2, "c", "d")
x x
[1] "1" "2" "c" "d"
class(x)
[1] "character"
# All elements of a vector in R must have the same class. Here all elements
# are coerced to be character.
Write an R command to generate a numeric vector containing the following sequence:
\[ (10, 10, 20, 20, 30, 30, 40, 40, \dots, 470, 470, 480, 480, 490, 490, 500, 500) \]
rep(seq(from = 10, to = 500, by = 10), each = 2)
[1] 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90
[19] 100 100 110 110 120 120 130 130 140 140 150 150 160 160 170 170 180 180
[37] 190 190 200 200 210 210 220 220 230 230 240 240 250 250 260 260 270 270
[55] 280 280 290 290 300 300 310 310 320 320 330 330 340 340 350 350 360 360
[73] 370 370 380 380 390 390 400 400 410 410 420 420 430 430 440 440 450 450
[91] 460 460 470 470 480 480 490 490 500 500
The logical vectors a
and b
have equal length. Which of the following options is always the same as !(a | b)
, regardless of the contents of a
and b
?
!a & !b
!a | !b
a | b
Note: You do not need to supply your code for this question.
# Make two vectors covering every possibility:
<- c(TRUE, TRUE, FALSE, FALSE)
a <- c(TRUE, FALSE, TRUE, FALSE)
b
# The target output is:
!(a | b)
[1] FALSE FALSE FALSE TRUE
# Try the different options:
# The first one matches:
!a & !b
[1] FALSE FALSE FALSE TRUE
# The remaining ones don't match:
!a | !b
[1] FALSE TRUE TRUE TRUE
| b a
[1] TRUE TRUE TRUE FALSE
Download the dataset ceosal.csv. The dataset contains information on chief executive officers (CEOs) at different companies. The variable descriptions are:
salary
: CEO compensation in 1990 (in dollars).age
: CEO agecollege
: \(=1\) if the CEO attended college and \(=0\) otherwise.grad
: \(=1\) if the CEO attended post-graduate education and \(=0\) otherwise.comten
: Years the CEO worked with the company.ceoten
: Years as CEO with the company.profits
: Profits of the company in 1990 (in dollars).When reading the dataset into R, assign it to df
.
How many observations are in the dataset?
Also assign your answer to answer6
in your code.
# First we need to change directory to the folder containing the ceosal.csv
# file. Ideally you would do this using an R project.
<- read.csv("ceosal.csv")
df <- nrow(df)
answer6 answer6
[1] 177
What is the median of the variable comten
?
Also assign your answer to answer7
in your code.
<- median(df$comten)
answer7 answer7
[1] 23
What is the mean salary of the CEOs that attended post-graduated education?
Also assign your answer to answer8
in your code.
<- mean(df$salary[df$grad == 1])
answer8 answer8
[1] 864212.8
How many people in the dataset attended college but didn’t attend post-graduate education?
Also assign your answer to answer9
in your code.
<- sum(df$college == 1 & df$grad == 0)
answer9 answer9
[1] 78
In the dataset, what is the longest someone worked at a company (in years) for before they became CEO?
Also assign your answer to answer10
in your code.
<- max(df$comten - df$ceoten)
answer10 answer10
[1] 39
Download the dataset euro-dollar-2022.csv. The dataset contains the closing Euro-Dollar exchange rate (variable “Price”) each day throughout 2022, together with the opening, highest and lowest exchange rate. Furthermore, it includes the volume traded and the percentage daily change in the closing price.
In this question you will need clean this dataset to answer the questions that follow.
When reading the dataset into R, assign it to df
.
You should do the following cleaning tasks to your dataframe df
:
Date
variable to a date.Date
ascending (the earliest date in the data should be first, the most recent date last).Price
, Open
, High
and Low
to numeric.Vol
to numeric. For example, "33.87K"
should be 33870
. Hint: First use the gsub()
function to remove the K
. Then convert the variable to numeric format. Finally multiply it by 1,000.Change
to numeric. Tip: Use gsub("\\%", "", x)
to remove a percentage symbol from x
.If you did all the steps correctly, you should have 260 observations. The average of the high
variable should be 0.9561. The average of the vol
variable should be 79,920. If only some of these match your cleaned dataset, you will still be able to answer some of the questions correctly.
# First perform all the cleaning tasks:
<- read.csv("euro-dollar-2022.csv")
df
# Format the date:
head(df$Date)
[1] "12/31/22" "12/30/22" "12/29/22" "12/28/22" "12/27/22" "12/26/22"
# Format is MM/DD/YY (year without century):
$Date <- as.Date(df$Date, format = "%m/%d/%y")
df
# Order by date:
<- df[order(df$Date), ]
df
# Drop observations with missing data:
<- na.omit(df)
df
# Convert prices data to numeric:
$Price <- as.numeric(df$Price)
df$Open <- as.numeric(df$Open)
df$High <- as.numeric(df$High)
df$Low <- as.numeric(df$Low)
df
# Convert volume to numeric:
$Vol <- 1000 * as.numeric(gsub("K", "", df$Vol))
df
# Convert Change numeric:
$Change <- as.numeric(gsub("\\%", "", df$Change))
df
# Convert variable names to lower case:
names(df) <- tolower(names(df))
Create a variable called hml
which is the high
variable minus the low
variable. What is the mean of this variable?
Also assign your answer to answer11
in your code.
$hml <- df$high - df$low
df<- mean(df$hml)
answer11 answer11
[1] 0.009504231
What is the median of the vol
variable?
Also assign your answer to answer12
in your code.
<- median(df$vol)
answer12 answer12
[1] 80185
What was the largest negative daily price change in the data?
Write the percentage change without the %
symbol.
Also assign your answer to answer13
in your code.
<- min(df$change)
answer13 answer13
[1] -2.1
On which date was the largest volume traded?
Also assign your answer to answer14
in your code.
<- df$date[df$vol == max(df$vol)]
answer14 answer14
[1] "2022-06-16"
# Alternatively, sort the data by volume descending and get the first date:
order(df$vol, decreasing = TRUE), ]$date[1] df[
[1] "2022-06-16"
The following 3 questions will involve working with the following mathematical function defined over all real numbers \(x\):
\[f(x) = -x^2 + 2x - 5\]
Plot the function between the \(x\) values \(-3\) and \(+5\). Choose the answer below which best describes the shape of this function:
Note: you do not need to save your answer in your R script for this question.
<- function(x) {
f <- -x^2 + 2*x - 5
y return(y)
}library(ggplot2)
<- seq(-3, 5, length.out = 2000)
x <- f(x)
y <- data.frame(x, y)
df ggplot(df, aes(x, y)) + geom_line()
# We can see that it has an inverted U shape.
Use R to find the value of \(x\) that maximizes this function.
Also assign your answer to answer16
in your code.
<- optimize(f, c(-100, 100), maximum = TRUE)
f_max <- f_max$maximum
answer16 answer16
[1] 1
What value does the function take at its maximum?
Also assign your answer to answer17
in your code.
<- f_max$objective
answer17 answer17
[1] -4
# or alternatively:
f(f_max$maximum)
[1] -4
Download the two datasets:
country
, year
and gdp_pc_growth
. The variable gdp_pc_growth
is the growth rate of the country’s per capita gross domestic product (GDP) in that year.country
, year
and lending_rate
. The variable lending_rate
is the lending interest rate in that country in that year.Assign the first dataset gdp-per-capita-growths.csv
to df1
when reading it into R.
Assign the second dataset lending-rates.csv
to df2
when reading it into R.
Using the dataset gdp-per-capita-growths.csv
, calculate the average per capita GDP growth rate across countries by year.
In what year in the data was the average GDP per capita growth rate the smallest?
Also assign your answer to answer18
in your code.
<- read.csv("gdp-per-capita-growths.csv")
df1 <- read.csv("lending-rates.csv")
df2
<- aggregate(gdp_pc_growth ~ year, data = df1, FUN = mean)
tmp <- tmp[order(tmp$gdp_pc_growth), ]
tmp <- tmp$year[1]
answer18 answer18
[1] 2009
Merge the datasets gdp-per-capita-growths.csv
and lending-rates.csv
together by the variables "country"
and "year"
, dropping observations without a match. Your merged dataset should have 2,572 observations and 4 variables.
Report the mean growth rate in GDP per capita in the merged dataset.
Also assign your answer to answer19
in your code.
# Merge by country and year:
<- merge(df1, df2, by = c("country", "year"))
df
# Check that dimensions match the question description:
dim(df)
[1] 2572 4
<- mean(df$gdp_pc_growth)
answer19 answer19
[1] 2.456217
Create a subset of the merged data from Question 19 which only includes data on the Netherlands. This is when the country
variable equals "Netherlands"
.
Reshape the Netherlands data to long format and use this long-format data to create a line plot of per capita GDP growth and the lending rate over time in the Netherlands. Choose the answer below which best describes what the plot shows in 2009:
Hint: Your long-format data for the Netherlands should have 4 variables: "country"
, "year"
, "variable"
and "value"
, where:
"country"
is "Netherlands"
everywhere."year"
takes on the values 2000-2013 repeated twice."variable"
contains the two variable names ("gdp_pc_growth"
and "lending_rate"
) repeated for each year."value"
contains the values of those variables in those years.The first 3 rows of your reshaped data should look like:
country year variable value
Netherlands 2000 gdp_pc_growth 3.4535383
Netherlands 2001 gdp_pc_growth 1.5574581
Netherlands 2002 gdp_pc_growth -0.4203677
Note: You do not need to save your answer in your R script for this question.
library(reshape2)
<- df[df$country == "Netherlands", ]
df_nl <- melt(df_nl, id.vars = c("country", "year"))
df_nl_long
library(ggplot2)
ggplot(df_nl_long, aes(year, value, color = variable)) +
geom_line()
# Answer: GDP per capita growth fell sharply, and the lending rate also decreased.