Programming and Quantitative Skills Sample Exam

Short-Answer Questions (5 points)

Question 1 (1 point)

Calculate the following using R:

\[ \frac{2^4 + 10}{8\times \sqrt{4}} \]

(2^4 + 10) / (8 * sqrt(4))
[1] 1.625

Question 2 (1 point)

Calculate \(\log_2\left(64\right)\) using R.

log(64, base = 2)
[1] 6

Question 3 (1 point)

If we create the following vector in R, what class will it be?

c(1, 2, "c", "d")
  • Numeric.
  • Character.
  • Both numeric and character.
  • Neither numeric nor character.

Note: You do not need to supply your code for this question.

x <- c(1, 2, "c", "d")
x
[1] "1" "2" "c" "d"
class(x)
[1] "character"
# All elements of a vector in R must have the same class. Here all elements
# are coerced to be character.

Question 4 (1 point)

Write an R command to generate a numeric vector containing the following sequence:

\[ (10, 10, 20, 20, 30, 30, 40, 40, \dots, 470, 470, 480, 480, 490, 490, 500, 500) \]

rep(seq(from = 10, to = 500, by = 10), each = 2)
  [1]  10  10  20  20  30  30  40  40  50  50  60  60  70  70  80  80  90  90
 [19] 100 100 110 110 120 120 130 130 140 140 150 150 160 160 170 170 180 180
 [37] 190 190 200 200 210 210 220 220 230 230 240 240 250 250 260 260 270 270
 [55] 280 280 290 290 300 300 310 310 320 320 330 330 340 340 350 350 360 360
 [73] 370 370 380 380 390 390 400 400 410 410 420 420 430 430 440 440 450 450
 [91] 460 460 470 470 480 480 490 490 500 500

Question 5 (1 point)

The logical vectors a and b have equal length. Which of the following options is always the same as !(a | b), regardless of the contents of a and b?

  • !a & !b
  • !a | !b
  • a | b

Note: You do not need to supply your code for this question.

# Make two vectors covering every possibility:
a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)

# The target output is:
!(a | b)
[1] FALSE FALSE FALSE  TRUE
# Try the different options:

# The first one matches:
!a & !b
[1] FALSE FALSE FALSE  TRUE
# The remaining ones don't match:
!a | !b
[1] FALSE  TRUE  TRUE  TRUE
a | b
[1]  TRUE  TRUE  TRUE FALSE

Data Analysis (5 points)

Download the dataset ceosal.csv. The dataset contains information on chief executive officers (CEOs) at different companies. The variable descriptions are:

  • salary: CEO compensation in 1990 (in dollars).
  • age: CEO age
  • college: \(=1\) if the CEO attended college and \(=0\) otherwise.
  • grad: \(=1\) if the CEO attended post-graduate education and \(=0\) otherwise.
  • comten: Years the CEO worked with the company.
  • ceoten: Years as CEO with the company.
  • profits: Profits of the company in 1990 (in dollars).

When reading the dataset into R, assign it to df.

Question 6 (1 point)

How many observations are in the dataset?

Also assign your answer to answer6 in your code.

# First we need to change directory to the folder containing the ceosal.csv
# file. Ideally you would do this using an R project.

df <- read.csv("ceosal.csv")
answer6 <- nrow(df)
answer6
[1] 177

Question 7 (1 point)

What is the median of the variable comten?

Also assign your answer to answer7 in your code.

answer7 <- median(df$comten)
answer7
[1] 23

Question 8 (1 point)

What is the mean salary of the CEOs that attended post-graduated education?

Also assign your answer to answer8 in your code.

answer8 <- mean(df$salary[df$grad == 1])
answer8
[1] 864212.8

Question 9 (1 point)

How many people in the dataset attended college but didn’t attend post-graduate education?

Also assign your answer to answer9 in your code.

answer9 <- sum(df$college == 1 & df$grad == 0)
answer9
[1] 78

Question 10 (1 point)

In the dataset, what is the longest someone worked at a company (in years) for before they became CEO?

Also assign your answer to answer10 in your code.

answer10 <- max(df$comten - df$ceoten)
answer10
[1] 39

Data Cleaning (4 points)

Download the dataset euro-dollar-2022.csv. The dataset contains the closing Euro-Dollar exchange rate (variable “Price”) each day throughout 2022, together with the opening, highest and lowest exchange rate. Furthermore, it includes the volume traded and the percentage daily change in the closing price.

In this question you will need clean this dataset to answer the questions that follow.

When reading the dataset into R, assign it to df.

You should do the following cleaning tasks to your dataframe df:

  1. Format the Date variable to a date.
  2. Sort the data by Date ascending (the earliest date in the data should be first, the most recent date last).
  3. Drop rows with any missing data.
  4. Convert Price, Open, High and Low to numeric.
  5. Convert Vol to numeric. For example, "33.87K" should be 33870. Hint: First use the gsub() function to remove the K. Then convert the variable to numeric format. Finally multiply it by 1,000.
  6. Convert Change to numeric. Tip: Use gsub("\\%", "", x) to remove a percentage symbol from x.
  7. Convert all variable names to lower case.

If you did all the steps correctly, you should have 260 observations. The average of the high variable should be 0.9561. The average of the vol variable should be 79,920. If only some of these match your cleaned dataset, you will still be able to answer some of the questions correctly.

# First perform all the cleaning tasks:

df <- read.csv("euro-dollar-2022.csv")

# Format the date:
head(df$Date)
[1] "12/31/22" "12/30/22" "12/29/22" "12/28/22" "12/27/22" "12/26/22"
# Format is MM/DD/YY (year without century):
df$Date <- as.Date(df$Date, format = "%m/%d/%y")

# Order by date:
df <- df[order(df$Date), ]

# Drop observations with missing data:
df <- na.omit(df)

# Convert prices data to numeric:
df$Price <- as.numeric(df$Price)
df$Open <- as.numeric(df$Open)
df$High <- as.numeric(df$High)
df$Low <- as.numeric(df$Low)

# Convert volume to numeric:
df$Vol <- 1000 * as.numeric(gsub("K", "", df$Vol))

# Convert Change numeric:
df$Change <- as.numeric(gsub("\\%", "", df$Change))

# Convert variable names to lower case:
names(df) <- tolower(names(df))

Question 11 (1 point)

Create a variable called hml which is the high variable minus the low variable. What is the mean of this variable?

Also assign your answer to answer11 in your code.

df$hml <- df$high - df$low
answer11 <- mean(df$hml)
answer11
[1] 0.009504231

Question 12 (1 point)

What is the median of the vol variable?

Also assign your answer to answer12 in your code.

answer12 <- median(df$vol)
answer12
[1] 80185

Question 13 (1 point)

What was the largest negative daily price change in the data?

Write the percentage change without the % symbol.

Also assign your answer to answer13 in your code.

answer13 <- min(df$change)
answer13
[1] -2.1

Question 14 (1 point)

On which date was the largest volume traded?

Also assign your answer to answer14 in your code.

answer14 <- df$date[df$vol == max(df$vol)]
answer14
[1] "2022-06-16"
# Alternatively, sort the data by volume descending and get the first date:
df[order(df$vol, decreasing = TRUE), ]$date[1]
[1] "2022-06-16"

Optimization (3 points)

The following 3 questions will involve working with the following mathematical function defined over all real numbers \(x\):

\[f(x) = -x^2 + 2x - 5\]

Question 15 (1 point)

Plot the function between the \(x\) values \(-3\) and \(+5\). Choose the answer below which best describes the shape of this function:

  • Straight line
  • Flat
  • U shape
  • Inverted U shape (upside-down U)

Note: you do not need to save your answer in your R script for this question.

f <- function(x) {
  y <- -x^2 + 2*x - 5
  return(y)
}
library(ggplot2)
x <- seq(-3, 5, length.out = 2000)
y <- f(x)
df <- data.frame(x, y)
ggplot(df, aes(x, y)) + geom_line()

# We can see that it has an inverted U shape.

Question 16 (1 point)

Use R to find the value of \(x\) that maximizes this function.

Also assign your answer to answer16 in your code.

f_max <- optimize(f, c(-100, 100), maximum = TRUE)
answer16 <- f_max$maximum
answer16
[1] 1

Question 17 (1 point)

What value does the function take at its maximum?

Also assign your answer to answer17 in your code.

answer17 <- f_max$objective
answer17
[1] -4
# or alternatively:
f(f_max$maximum)
[1] -4

Aggregating, Merging and Reshaping (3 points)

Download the two datasets:

  1. gdp-per-capita-growths.csv - this dataset contains 3 variables: country, year and gdp_pc_growth. The variable gdp_pc_growth is the growth rate of the country’s per capita gross domestic product (GDP) in that year.
  2. lending-rates.csv - this dataset contains 3 variables: country, year and lending_rate. The variable lending_rate is the lending interest rate in that country in that year.

Assign the first dataset gdp-per-capita-growths.csv to df1 when reading it into R.

Assign the second dataset lending-rates.csv to df2 when reading it into R.

Question 18 (1 points)

Using the dataset gdp-per-capita-growths.csv, calculate the average per capita GDP growth rate across countries by year.

In what year in the data was the average GDP per capita growth rate the smallest?

Also assign your answer to answer18 in your code.

df1 <- read.csv("gdp-per-capita-growths.csv")
df2 <- read.csv("lending-rates.csv")

tmp <- aggregate(gdp_pc_growth ~ year, data = df1, FUN = mean)
tmp <- tmp[order(tmp$gdp_pc_growth), ]
answer18 <- tmp$year[1]
answer18
[1] 2009

Question 19 (1 point)

Merge the datasets gdp-per-capita-growths.csv and lending-rates.csv together by the variables "country" and "year", dropping observations without a match. Your merged dataset should have 2,572 observations and 4 variables.

Report the mean growth rate in GDP per capita in the merged dataset.

Also assign your answer to answer19 in your code.

# Merge by country and year:
df <- merge(df1, df2, by = c("country", "year"))

# Check that dimensions match the question description:
dim(df)
[1] 2572    4
answer19 <- mean(df$gdp_pc_growth)
answer19
[1] 2.456217

Question 20 (1 point)

Create a subset of the merged data from Question 19 which only includes data on the Netherlands. This is when the country variable equals "Netherlands".

Reshape the Netherlands data to long format and use this long-format data to create a line plot of per capita GDP growth and the lending rate over time in the Netherlands. Choose the answer below which best describes what the plot shows in 2009:

  • GDP per capita growth fell sharply, but the lending rate increased.
  • GDP per capita growth rose sharply, but the lending rate decreased.
  • GDP per capita growth fell sharply, and the lending rate also decreased.
  • GDP per capita growth rose sharply, and the lending rate also increased.

Hint: Your long-format data for the Netherlands should have 4 variables: "country", "year", "variable" and "value", where:

  • "country" is "Netherlands" everywhere.
  • "year" takes on the values 2000-2013 repeated twice.
  • "variable" contains the two variable names ("gdp_pc_growth" and "lending_rate") repeated for each year.
  • "value" contains the values of those variables in those years.

The first 3 rows of your reshaped data should look like:

    country year      variable      value
Netherlands 2000 gdp_pc_growth  3.4535383
Netherlands 2001 gdp_pc_growth  1.5574581
Netherlands 2002 gdp_pc_growth -0.4203677

Note: You do not need to save your answer in your R script for this question.

library(reshape2)

df_nl <- df[df$country == "Netherlands", ]
df_nl_long <- melt(df_nl, id.vars = c("country", "year"))

library(ggplot2)
ggplot(df_nl_long, aes(year, value, color = variable)) +
  geom_line()

# Answer: GDP per capita growth fell sharply, and the lending rate also decreased.