Programming and Quantitative Skills: Extra Practice Questions
Short-Answer Questions (5 points)
Question 1 (1 point)
Calculate \(e^2\) using R:
Question 2 (1 point)
Calculate \(\sqrt[4]{256}\) using R.
Question 3 (1 point)
If we create the following vector in R, what class will it be?
c(FALSE, TRUE, "c", "d")
- Logical.
- Character.
- Both logical nd character.
- Neither logical nor character.
Note: You do not need to supply your code for this question.
Question 4 (1 point)
Write an R command to generate a numeric vector containing the following sequence:
\[ (0, 2, 0, 4, 0, 6, 0, 8, \dots, 0, 98, 0, 100) \]
Question 5 (1 point)
Consider the two vectors below: \[
\begin{split}
a &= (1, 4, 8, 2, 5, 6) \\
b &= (2, 4, 6, 8, 10)
\end{split}
\] Write an R command that returns for each element in \(a\), TRUE
if that element is contained anywhere \(b\), and FALSE
otherwise. The output should be a logical vector with 6 elements (the number of elements in \(a\)). For example, the first element should be FALSE
because 1 is not contained anywhere in \(b\), whereas the second element should be TRUE
because 4 is contained somewhere in \(b\).
Data Analysis (5 points)
Download the dataset ceosal.csv. The dataset contains information on chief executive officers (CEOs) at different companies. The variable descriptions are:
salary
: CEO compensation in 1990 (in dollars).age
: CEO agecollege
: \(=1\) if the CEO attended college and \(=0\) otherwise.grad
: \(=1\) if the CEO attended post-graduate education and \(=0\) otherwise.comten
: Years the CEO worked with the company.ceoten
: Years as CEO with the company.profits
: Profits of the company in 1990 (in dollars).
Question 6 (1 point)
How many variables are in the dataset?
Question 7 (1 point)
What is the maximum of the variable comten
?
Question 8 (1 point)
What is the mean salary of the CEOs aged between 50-59?
Question 9 (1 point)
How many CEOs in the dataset didn’t work for their company before becoming the CEO?
Question 10 (1 point)
How old is the best-paid CEO in the dataset?
Data Cleaning (4 points)
Download the dataset euro-dollar-2022.csv. The dataset contains the closing Euro-Dollar exchange rate (variable “Price”) each day throughout 2022, together with the opening, highest and lowest exchange rate. Furthermore, it includes the volume traded and the percentage daily change in the closing price.
In this question you will need clean this dataset to answer the questions that follow.
You should do the following cleaning tasks:
- Format the
Date
variable to a date. - Sort the data by
Date
ascending (the earliest date in the data should be first, the most recent date last). - Drop rows with any missing data.
- Convert
Price
,Open
,High
andLow
to numeric. - Convert
Vol
to numeric. For example,"33.87K"
should be33870
. Hint: First use thegsub()
function to remove theK
. Then convert the variable to numeric format. Finally multiply it by 1,000. - Convert
Change
to numeric. Tip: Usegsub("\\%", "", x)
to remove a percentage symbol fromx
. - Convert all variable names to lower case.
If you did all the steps correctly, you should have 260 observations. The average of the high
variable should be 0.9561. The average of the vol
variable should be 79,920. If only some of these match your cleaned dataset, you will still be able to answer some of the questions correctly.
Question 11 (1 point)
Create a variable called open_vs_close
which is the open
variable minus the price
variable. What is the maximum of this variable?
Question 12 (1 point)
Use the function wday()
from the lubridate
package to get the numeric day of the week from the date
variable as follows: wday(df$date, week_start = 1)
. The function returns numbers for each day of the week as follows:
- 1 means Monday.
- 2 means Tuesday.
- 3 means Wednesday.
- 4 means Thursday.
- 5 means Friday.
On what day of the week was the lowest volume traded in the dataset?
Question 13 (1 point)
What was the largest positive daily price change in the data?
Write the percentage change without the %
symbol.
Question 14 (1 point)
What proportion of days did the exchange rate exceed 1.00?
Optimization (3 points)
The following 3 questions will involve working with the following mathematical function defined over all real numbers \(x\):
\[ f(x) = \begin{cases} 3 &\text{ if } x < -1 \\ 8 &\text{ if } x > 4 \\ x^2-2x &\text{ otherwise} \\ \end{cases} \]
Question 15 (1 point)
Plot the function between the \(x\) values \(-3\) and \(+5\). Choose the answer below which best describes the shape of this function:
- Flat everywhere
- U shape
- Inverted U shape (upside-down U)
- Flat over some ranges of \(x\), and U-shaped in other ranges of \(x\).
- Flat over some ranges of \(x\), and inverted U-shaped in other ranges of \(x\).
Note: you do not need to save your answer in your R script for this question.
Question 16 (1 point)
Use R to find the value of \(x\) that minimizes this function. Specify the interval to search over to be \([-1, 4]\).
Question 17 (1 point)
What value does the function take at the minimum?
Aggregating, Merging and Reshaping (3 points)
Download the two datasets:
- gdp-per-capita-growths.csv - this dataset contains 3 variables:
country
,year
andgdp_pc_growth
. The variablegdp_pc_growth
is the growth rate of the country’s per capita gross domestic product (GDP) in that year. - lending-rates.csv - this dataset contains 3 variables:
country
,year
andlending_rate
. The variablelending_rate
is the lending interest rate in that country in that year.
Question 18 (1 points)
Using the dataset gdp-per-capita-growths.csv
, calculate the minimum per capita GDP growth rate each country experienced over the years recorded in the dataset.
Which country had the largest minimum growth rate out of all countries?
Question 19 (1 point)
Merge the datasets gdp-per-capita-growths.csv
and lending-rates.csv
together by the variables "country"
and "year"
, keeping all observations in the gdp-per-capita-growths.csv
dataset, but only merging observations from the lending-rates.csv
dataset with a match.
Your merged dataset should have 4,087 observations and 4 variables.
How many observations of the variable lending_rate
are missing in the merged dataset?
Question 20 (1 point)
Perform the following steps with the lending-rates.csv
dataset:
- Create a subset of the dataset which only includes data on the years 2000 and 2010.
- Using this subset, reshape the data to wide format with rows representing countries and columns for the country, the lending rate in 2000 and the lending rate in 2010.
- Drop any rows with missing observations.
Your resulting dataset should have 108 rows and 3 variables. The first three rows should look like:
country 2000 2010
Albania 22.10250 12.82233
Algeria 10.00000 8.00000
Angola 103.16017 22.54357
What is the mean change in the lending rate from 2000 to 2010 in the dataset? That is, find the mean of the lending rate in 2010 minus the lending rate in 2000.