Programming for E&BI Sample Exam

Short-Answer Questions (5 points)

Question 1 (1 point)

Write an R command that calculates the following:

\frac{2^4 + 10}{8\times \sqrt{4}} Provide both the numerical answer and the R command.

Question 2 (1 point)

Write an R command that calculates \log_2\left(64\right)

Provide both the numerical answer and the R command.

Question 3 (1 point)

If we create the following vector in R, what class will it be?

c(1, 2, "c", "d")
  • Numeric.
  • Character.
  • Both numeric and character.
  • Neither numeric nor character.

Note: You do not need to supply your code for this question.

Question 4 (1 point)

Write an R command that generates a numeric vector containing the following sequence:

(10, 10, 20, 20, 30, 30, 40, 40, \dots, 470, 470, 480, 480, 490, 490, 500, 500)

Question 5 (1 point)

The logical vectors a and b have equal length. Which of the following options is always the same as !(a | b), regardless of the contents of a and b?

  • !a & !b
  • !a | !b
  • a | b

Note: You do not need to supply your code for this question.

Data Analysis (5 points)

Download the dataset ceosal.csv. The dataset contains information on chief executive officers (CEOs) at different companies. The variable descriptions are:

  • salary: CEO compensation in 1990 (in dollars).
  • age: CEO age
  • college: =1 if the CEO attended college and =0 otherwise.
  • grad: =1 if the CEO attended post-graduate education and =0 otherwise.
  • comten: Years the CEO worked with the company.
  • ceoten: Years as CEO with the company.
  • profits: Profits of the company in 1990 (in dollars).

When reading the dataset into R, assign it to df.

Question 6 (1 point)

How many observations are in the dataset?

Provide both the numerical answer and the R command required to obtain the answer (if the dataframe is assigned to df).

Question 7 (1 point)

What is the median of the variable comten?

Provide both the numerical answer and the R command required to obtain the answer (if the dataframe is assigned to df).

Question 8 (1 point)

What is the mean salary of the CEOs that attended post-graduated education?

Provide both the numerical answer and the R command required to obtain the answer (if the dataframe is assigned to df).

Question 9 (1 point)

How many people in the dataset attended college but didn’t attend post-graduate education?

Provide both the numerical answer and the R command required to obtain the answer (if the dataframe is assigned to df).

Question 10 (1 point)

In the dataset, what is the longest someone worked at a company (in years) for before they became CEO?

Provide both the numerical answer and the R command required to obtain the answer (if the dataframe is assigned to df).

Data Cleaning (4 points)

Download the dataset euro-dollar-2022.csv. The dataset contains the closing Euro-Dollar exchange rate (variable Price) each day throughout 2022, together with the opening, highest and lowest exchange rate. Furthermore, it includes the volume traded and the percentage daily change in the closing price.

Download the following template script and use it to clean the data and answer the questions that follow.

When reading the dataset into R, assign it to df.

You should do the following cleaning tasks to your dataframe df:

  1. Format the Date variable to a date.
  2. Sort the data by Date ascending (the earliest date in the data should be first, the most recent date last).
  3. Drop rows with any missing data.
  4. Convert Price, Open, High and Low to numeric.
  5. Convert Vol to numeric. For example, "33.87K" should be 33870. Hint: First use the gsub() function to remove the K. Then convert the variable to numeric format. Finally multiply it by 1,000.
  6. Convert Change to numeric. Tip: Use gsub("\\%", "", x) to remove a percentage symbol from x.
  7. Convert all variable names to lower case.

If you did all the steps correctly, you should have 260 observations. The average of the high variable should be 0.9561. The average of the vol variable should be 79920. If only some of these match your cleaned dataset, you will still be able to answer some of the questions correctly.

Question 11 (1 point)

Create a variable called hml which is the high variable minus the low variable.

Use this variable to write an R command that computes the mean of this variable.

Assign the output of this command to the variable a11 in your script and write its numerical value in the box below.

Question 12 (1 point)

Write an R command that computes the median of the vol variable.

Assign the output of this command to the variable a12 in your script and write its numerical value in the box below.

Question 13 (1 point)

Write an R command that computes the largest negative daily price change in the data.

Assign the output of this command to the variable a13 in your script and write its numerical value (without the % symbol) in the box below.

Question 14 (1 point)

Write an R command that finds the date on which the largest volume was traded.

Assign the output of this command to the variable a14 in your script and write the day, month and year in the boxes below.

Optimization (3 points)

The following 3 questions will involve working with the following mathematical function defined over all real numbers x:

f(x) = -x^2 + 2x - 5

Question 15 (1 point)

Plot the function between the x values -3 and +5. Choose the answer below which best describes the shape of this function:

  • Straight line
  • Flat
  • U shape
  • Inverted U shape (upside-down U)

Note: you do not need to assign your answer to this question to any variable in your R.

Question 16 (1 point)

Write an R command that finds the value of x that maximizes this function.

Write the numerical value of the output of this command in the box below.

Question 17 (1 point)

Write an R command that finds the value of the function f(x) at its maximum value.

Write the numerical value of the output of this command in the box below.

Aggregating, Merging and Reshaping (3 points)

Download the two datasets:

  1. gdp-per-capita-growths.csv - this dataset contains 3 variables: country, year and gdp_pc_growth. The variable gdp_pc_growth is the growth rate of the country’s per capita gross domestic product (GDP) in that year.
  2. lending-rates.csv - this dataset contains 3 variables: country, year and lending_rate. The variable lending_rate is the lending interest rate in that country in that year.

Assign the dataset gdp-per-capita-growths.csv to df1 and lending-rates.csv to df2 in your script.

Question 18 (1 points)

Using the dataset gdp-per-capita-growths.csv, calculate the average per capita GDP growth rate across countries by year.

Write an R command that finds the year in the data when the average GDP per capita growth rate the smallest?

Write the numerical value of your answer in the box below.

Question 19 (1 point)

Use R to merge the datasets gdp-per-capita-growths.csv and lending-rates.csv together by the variables "country" and "year", dropping observations without a match. Your merged dataset should have 2,572 observations and 4 variables.

Write an R command that computes the mean growth rate in GDP per capita in the merged dataset.

Write the numerical value of your answer in the box below.

Question 20 (1 point)

Create a subset of the merged data from Question 19 which only includes data on the Netherlands. This is when the country variable equals "Netherlands".

Reshape the Netherlands data to long format and use this long-format data to create a line plot of per capita GDP growth and the lending rate over time in the Netherlands. Choose the answer below which best describes what the plot shows in 2009:

  • GDP per capita growth fell sharply, but the lending rate increased.
  • GDP per capita growth rose sharply, but the lending rate decreased.
  • GDP per capita growth fell sharply, and the lending rate also decreased.
  • GDP per capita growth rose sharply, and the lending rate also increased.

Hint: Your long-format data for the Netherlands should have 4 variables: "country", "year", "variable" and "value", where:

  • "country" is "Netherlands" everywhere.
  • "year" takes on the values 2000-2013 repeated twice.
  • "variable" contains the two variable names ("gdp_pc_growth" and "lending_rate") repeated for each year.
  • "value" contains the values of those variables in those years.

The first 3 rows of your reshaped data should look like:

    country year      variable      value
Netherlands 2000 gdp_pc_growth  3.4535383
Netherlands 2001 gdp_pc_growth  1.5574581
Netherlands 2002 gdp_pc_growth -0.4203677