Jacob Jameson
Summer 2021
1d Data Structures in R include the below two types:
(Atomic) vectors and lists are the most common and basic data structures in R and are pretty much the workhorse of R.
A vector is a collection of elements that are most commonly of mode character,logical, integer or numeric.
You can create an empty vector with vector(). (By default the mode is logical.)
vector() # an empty 'logical' (the default) vector
logical(0)
vector(mode="character", length = 5) # a vector of mode 'character' with 5 elements
[1] "" "" "" "" ""
You can be more explicit: It is more common to use direct constructors such as character(), numeric(), etc.
character(5) # the same thing, but using the constructor directly
[1] "" "" "" "" ""
numeric(5) # a numeric vector with 5 elements
[1] 0 0 0 0 0
logical(5) # a logical vector with 5 elements
[1] FALSE FALSE FALSE FALSE FALSE
You can also create vectors by directly specifying their content. R will then guess the appropriate mode of storage for the vector. For instance:
x = c(1, 2, 3)
x
[1] 1 2 3
will create a vector x of mode numeric. These are the most common kind, and are treated as double precision real numbers.
You can create vectors as a sequence of numbers.
1:10
[1] 1 2 3 4 5 6 7 8 9 10
seq(10)
[1] 1 2 3 4 5 6 7 8 9 10
seq(from = 1, to = 10, by = 0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
[16] 8.5 9.0 9.5 10.0
Using TRUE and FALSE will create a vector of mode logical:
y = c(TRUE, TRUE, FALSE, FALSE)
y
[1] TRUE TRUE FALSE FALSE
While using quoted text will create a vector of mode character:
z = c("Andy", "Ben", "Charlie")
z
[1] "Andy" "Ben" "Charlie"
The functions length()
, class()
and str()
provide useful information about your vectors and R objects in general.
length(z)
[1] 3
class(z)
[1] "character"
str(z)
chr [1:3] "Andy" "Ben" "Charlie"
The function c() (for combine) can also be used to add elements to a vector.
z = c(z, "Doug")
z
[1] "Andy" "Ben" "Charlie" "Doug"
z = c("Eric", z)
z
[1] "Eric" "Andy" "Ben" "Charlie" "Doug"
We can access data by the index
z[3]
[1] "Ben"
z[2:4]
[1] "Andy" "Ben" "Charlie"
z[c(1,3)]
[1] "Eric" "Ben"
A logical vector contains only the special values TRUE & FALSE. We will talk about vector next.
c(TRUE, TRUE, FALSE, FALSE, TRUE)
[1] TRUE TRUE FALSE FALSE TRUE
z
[1] "Eric" "Andy" "Ben" "Charlie" "Doug"
z[c(TRUE, TRUE, FALSE, FALSE, TRUE)]
[1] "Eric" "Andy" "Doug"
Logical vectors can be created using relational operators e.g. <, >, ==, !=, %in%
.
x = c(1, 2, 3, 11, 12, 13)
x < 10
[1] TRUE TRUE TRUE FALSE FALSE FALSE
Exercise
Select all elements in a vector whose value < 10
Change those values to 0
Select all elements in a vector whose value < 10
x[x < 10]
[1] 1 2 3
Change those values to 0
x[x < 10] = 0
x
[1] 0 0 0 11 12 13
R will create a resulting vector with a mode that can most easily accommodate all the elements it contains. This conversion between modes of storage is called “coercion”. When R converts the mode of storage based on its content, it is referred to as “implicit coercion”. For instance, can you guess what the following do (without running them first)?
c(4, "ch")
c(TRUE, 5)
c(FALSE, 100)
c(TRUE, "ch")
c(4, "ch")
[1] "4" "ch"
c(TRUE, 5)
[1] 1 5
c(FALSE, 100)
[1] 0 100
c(TRUE, "ch")
[1] "TRUE" "ch"
character > numeric > logical
You can also control how vectors are coerced explicitly using the as.
as.numeric(c("1", "2", "3"))
[1] 1 2 3
as.character(1:2)
[1] "1" "2"
as.numeric(c("a"))
[1] NA
In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further lists. This property makes them fundamentally different from atomic vectors.
A list is a special type of vector. Each element can be a different type.
Create lists using list()
or coerce other objects using as.list()
.
x = list(1, "a", TRUE)
x
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
The content of elements of a list can be retrieved by using double square brackets.
x[[1]]
[1] 1
- What is the class of x[1]?
- What about x[[3]]?
my_pie = list(type="key lime", diameter=7,
is.vegetarian=TRUE)
my_pie
$type
[1] "key lime"
$diameter
[1] 7
$is.vegetarian
[1] TRUE
names(my_pie)
[1] "type" "diameter" "is.vegetarian"
A list does not print to the console like a vector. Instead, each element of the list starts on a new line.
Elements are indexed by double brackets[[]]. Single brackets [] will still return a(nother) list. If the elements of a list are named, they can be referenced by the $ notation
my_pie$type
[1] "key lime"
dat = data.frame(id = letters[1:5], x = 1:5, y = 16:20)
dat
id x y
1 a 1 16
2 b 2 17
3 c 3 18
4 d 4 19
5 e 5 20
See that a data frame is actually a special list:
is.list(dat)
[1] TRUE
class(dat)
[1] "data.frame"