<- c(1, 2, 4, 3, 2)
a 3] a[
[1] 4
In this chapter we will learn how to do some operations on vectors in R.
Suppose we have a vector a
with 5 elements and we wanted to isolate the 3rd element of it. We can do this with what is called indexing. To get the 3rd element of a vector a
, we do a[3]
. Let’s see this with an example:
<- c(1, 2, 4, 3, 2)
a 3] a[
[1] 4
We can also extract multiple elements of the vector using a vector of indices inside the []
. For example, suppose we wanted to get the 1st, 3rd and 4th element of a
. We would put the vector c(1, 3, 4)
inside the square brackets:
c(1, 3, 4)] a[
[1] 1 4 3
We can also extract elements of a vector using a logical vector. Doing this will extract the elements where the logical vector is TRUE
. To do this, the logical vector needs to have the same length as the vector we are trying to index. Like above, if we want the 1st, 3rd and 4th element of a
, we can use a vector with TRUE
in the 1st, 3rd and 4th element and FALSE
everywhere else:
c(TRUE, FALSE, TRUE, TRUE, FALSE)] a[
[1] 1 4 3
Suppose I want everything in a vector except one element: I want to exclude one element from the vector. For example, suppose I want to see the entire vector a
except the 2nd element. We can do this using -2
in the brackets:
-2] a[
[1] 1 4 3 2
Often it is useful to create a sequence of numbers. For a simple sequence like 1, 2, 3, …, 10, we can just do:
1:10
[1] 1 2 3 4 5 6 7 8 9 10
We can also make the sequence go backwards by reversing the numbers:
10:1
[1] 10 9 8 7 6 5 4 3 2 1
For sequences that don’t jump in 1s we can use the seq()
function. Suppose we wanted to have a sequence from 10 to 100 with steps of 10. We do that with:
seq(from = 10, to = 100, by = 10)
[1] 10 20 30 40 50 60 70 80 90 100
Instead of specifying the step length with by
, we can alternatively specify the length of the sequence. Suppose I wanted to have a sequence going from 0 to 1 in equal steps with 5 numbers in total. I can do that using the length.out
option:
seq(from = 0, to = 1, length.out = 5)
[1] 0.00 0.25 0.50 0.75 1.00
If I wanted to create a vector which is 1 repeated 5 times, I could do:
c(1, 1, 1, 1, 1)
[1] 1 1 1 1 1
But this would get very annoying to type and I could easily make a mistake if I wanted to make many more 1s. If we want to repeat a number many times, we can use the rep()
function. For example, if we want to make 100 1s, we would do:
rep(1, times = 100)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
At the start of Chapter 3 we briefly mentioned that the [1]
you see at the start of the output meant that the number we saw was the first one. Because with this example we have many numbers that run onto multiple lines, we can see a [38]
at the start of line 2. This means that the first 1 on the 2nd line is the 38th element of the vector. The [75]
on the 3rd line means the 1st one on that line is the 75th element.
The rep()
function can also be combined with vectors. Suppose I wanted to repeat 1, 2, 3 four times:
rep(1:3, times = 4)
[1] 1 2 3 1 2 3 1 2 3 1 2 3
And if I instead wanted to repeat 1, 2, 3, each 4 times, I would use the each
option:
rep(1:3, each = 4)
[1] 1 1 1 1 2 2 2 2 3 3 3 3
We can get summary statistics for vectors using functions. Let’s look at some common ones using a simple vector with the sequence 1 to 10:
<- 1:10
a a
[1] 1 2 3 4 5 6 7 8 9 10
Get the number of elements of a
:
length(a)
[1] 10
Get the minimum value in a
:
min(a)
[1] 1
Get the maximum value in a
:
max(a)
[1] 10
Get the average of all elements in a
:
mean(a)
[1] 5.5
Get the median of all elements in a
:
median(a)
[1] 5.5
Note on the median: Normally the median orders all elements of the vector and gives the element in the middle. Because we have an even number of elements in a
(10 elements), the median is the average of the two values in the middle after sorting. Because it’s already sorted, these middle values are 5 and 6, so the median is (5+6)/2 = 5.5.
Get the sum of all elements in a
:
sum(a)
[1] 55
A useful way to quickly summarize a numeric vector is with the summary()
function, which gives the minimum, maximum, mean, median and interquartile range:
summary(a)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.50 5.50 7.75 10.00
Another useful way of summarizing data is to tabulate it: to count the number of occurrences of each value. We can do that with the table()
function:
<- c(1, 3, 2, 4, 4, 2, 4)
a table(a)
a
1 2 3 4
1 2 1 3
The output here means that 1 appeared once, 2 appeared twice, 3 appeared once and 4 appeared three times.