0% found this document useful (0 votes)
10 views

K22 Datenstrukturen

The document discusses different data structures in R including attributes, vectors, factors, matrices, arrays, lists and data frames. It provides details on generating and working with these structures as well as their characteristics and subsetting methods.

Uploaded by

DunsScoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

K22 Datenstrukturen

The document discusses different data structures in R including attributes, vectors, factors, matrices, arrays, lists and data frames. It provides details on generating and working with these structures as well as their characteristics and subsetting methods.

Uploaded by

DunsScoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Advanced R

Chapter 2.2: Data structures

Daniel Horn & Sheila Görz

Summer Semester 2022

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 1 / 85


Our plan for today

1 Attributes

2 Vectors

3 Factors

4 Matrices

5 Arrays

6 Lists

7 data.frames

8 Further data structures

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 2 / 85


Introduction

Last chapter, we have learnt about the six basic data types in R. Now, we
are taking a look at data structures. In particular, we are interested in:
Generating the various data structures,
what their respective use is,
what their characteristics are,
how to subset them.
But before we will actually turn to data structures themselves, we first have
to take a look at how R stores meta data to its objects → attributes.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 3 / 85


Attributes

Attributes

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 4 / 85


Attributes What are attributes?

Attributes - definition, placement, access I

Attributes store arbitrary meta data of a R object.


Imagine attributes as a named list with distinct names (more on lists
and their names later. For now, suffice it to say that distinct names
are not a necessity for ’normal’ lists.)
Internally, attributes are stored as a pairlist.
Attributes can be placed using the function attr():
x <- 1:5
attr(x, "description") <- "vector of the numbers 1 to 5"
attr(x, "length") <- length(x)
x
## [1] 1 2 3 4 5
## attr(,"description")
## [1] "vector of the numbers 1 to 5"
## attr(,"length")
## [1] 5

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 5 / 85


Attributes What are attributes?

Attributes - definition, placement, access II

The function attr() can also be used to access single attributes:


attr(x, "description")
## [1] "vector of the numbers 1 to 5"
attr(x, "length")
## [1] 5

However, attr() can not access multiple attributes at once:


attr(x, c("description", "length"))
## Error in attr(x, c("description", "length")): exactly one attribute
’which’ must be given

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 6 / 85


Attributes What are attributes?

Attributes - definition, placement, access III

To retrieve all attributes as a named list at once, the function


attributes() can be used:
attributes(x)
## $description
## [1] "vector of the numbers 1 to 5"
##
## $length
## [1] 5

Using attributes(), it is also possible to place multiple attributes at


once:
x <- 1:5
attributes(x) <- list(description = "vector of the numbers 1 to 5",
length = length(x))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 7 / 85


Attributes What are attributes?

Attributes - definition, placement, access IV

However, attributes() overrides all existing attributes:


x
## [1] 1 2 3 4 5
## attr(,"description")
## [1] "vector of the numbers 1 to 5"
## attr(,"length")
## [1] 5
attributes(x) <- list(type = typeof(x))
x
## [1] 1 2 3 4 5
## attr(,"type")
## [1] "integer"

Adding further attributes to a set of already existing attributes must


be performed individually using attr().

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 8 / 85


Attributes Special attributes

Special attributes I

There are a couple of special attributes in R:


names: The single elements of an object can be named:
x <- 1:6
names(x) <- c("one", "two", "three", "four", "five", "six")
x
## one two three four five six
## 1 2 3 4 5 6

comment: An object can contain a comment. This comment will not


be displayed when printing the object:
comment(x) <- "my comment"
x
## one two three four five six
## 1 2 3 4 5 6

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 9 / 85


Attributes Special attributes

Special attributes II

dim: An object can have multiple dimensions:


dim(x) <- c(2, 3)
x
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6

When placing a dim argument, all (dim)names attributes get dropped.


attributes(x)
## $comment
## [1] "my comment"
##
## $dim
## [1] 2 3

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 10 / 85


Attributes Special attributes

Special attributes III

dimnames: Every dimension of an object can be named:


dimnames(x) <- list(c("row 1", "row 2"),
c("column 1", "column 2", "column 3"))
x
## column 1 column 2 column 3
## row 1 1 3 5
## row 2 2 4 6

class: An object can be assigned to a specific ’class’. Important for


S3 programming (see chapter on object-oriented programming):
class(x) <- "my matrix"
x
## column 1 column 2 column 3
## row 1 1 3 5
## row 2 2 4 6
## attr(,"class")
## [1] "my matrix"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 11 / 85


Attributes Special attributes

Special attributes IV

Additionally, there are three further special attributes: levels for


factors, row.names for data.frames and tsp for time series objects.
These attributes can be accessed with the same function call that is
used for their placement:

dim(x) names(x)

## [1] 2 3 ## NULL

dimnames(x) class(x)

## [[1]] ## [1] "my matrix"


## [1] "row 1" "row 2"
## comment(x)
## [[2]]
## [1] "column 1" "column 2" "column 3" ## [1] "my comment"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 12 / 85


Attributes Attributes in the case object modification

Attributes in the case of object modification I

When an R object with attributes is modified, some attributes are preserved


for certain modifications while others are not. There are some general rules
to this. However, there also are exceptions to all rules, e.g. see the chapter
on Copying of Attributes in the R manual.
Element-wise modifications of an object usually preserve attributes:
x <- 1:6
attr(x, "description") <- "vector of the numbers 1 to 6"
x + 1
## [1] 2 3 4 5 6 7
## attr(,"description")
## [1] "vector of the numbers 1 to 6"

A possible exception to this is the class attribute.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 13 / 85


Attributes Attributes in the case object modification

Attributes in the case of object modification II

Functions changing the output type or length usually lead to the loss
of attributes:
attributes(sum(x))
## NULL

This includes subsetting and coercing in particular!


However, there also are exceptions to the rule above. A names
attribute is subsetted as well when subsetting:
c(one = 1, two = 2)[1]
## one
## 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 14 / 85


Attributes Attributes in the case object modification

Attributes in the case of object modification III

When using binary functions, usually the attributes of the longer


object are preserved:
y <- 1:2
attr(y, "description") <- "vector of the numbers 1 to 2"
y + x
## [1] 2 4 4 6 6 8
## attr(,"description")
## [1] "vector of the numbers 1 to 6"

If both objects are of equal length, then the attributes of the first
object are preserved:
y <- 2:7
attr(y, "description") <- "vector of the numbers 2 to 7"
y + x
## [1] 3 5 7 9 11 13
## attr(,"description")
## [1] "vector of the numbers 2 to 7"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 15 / 85


Vectors

Vectors

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 16 / 85


Vectors General remarks

What is a vector?

Recalling Paradigm 1 & 2: Everything in R is an object and the six


basic data types are vectors.
During the chapter on data types, we have already seen that single
numbers or words are vectors internally.
But what about matrices, lists and data.frames? Are these vectors
as well?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 85


Vectors General remarks

What is a vector?

Recalling Paradigm 1 & 2: Everything in R is an object and the six


basic data types are vectors.
During the chapter on data types, we have already seen that single
numbers or words are vectors internally.
But what about matrices, lists and data.frames? Are these vectors
as well?
They are indeed. However, when talking about vectors there is a distinction
to be made between ’atomic vectors’ and ’lists’. We will deal with the
atomic vectors first.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 85


Vectors General remarks

What is a vector made of?

Every vector (just like every list) consists of three components:


Type: identifiable via typeof(),
Length: identifiable via length(),
Attributes: identifiable via attributes().

The type of an atomic vector is one of the basic data types, i.e. double,
integer, character, logical, complex or raw. Atomic vectors can
only contain values of a single data type. Combining multiple data
types in a single vector leads to (undesired) coercion, see Paradigm 3.

The length of a vector is a natural number. Vectors of length 0 are allowed


and sensible.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 18 / 85


Vectors General remarks

Generating a vector I

As seen in the chapter on data types, atomic vectors can be generated


using type(n) where type denotes one of the six basic data types and
n the desired length, e.g. double(5).
Using the command vector() leads to the same result:

vector(mode = "double", length = 5) vector(mode = "logical", length = 5)


## [1] 0 0 0 0 0 ## [1] FALSE FALSE FALSE FALSE FALSE
vector(mode = "integer", length = 5) vector(mode = "complex", length = 5)
## [1] 0 0 0 0 0 ## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
vector(mode = "character", length = 5) vector(mode = "raw", length = 5)
## [1] "" "" "" "" "" ## [1] 00 00 00 00 00

Explicit values can be concatenated to a vector using the function


c(). Note that:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 19 / 85


Vectors General remarks

Generating a vector II

The names attribute can be placed directly when generating the vector:
c(one = 1, two = 2, three = 3)
## one two three
## 1 2 3
Multiple nested vectors are brought down to the same level:
c(c(1, 2), c(3, 4))
## [1] 1 2 3 4

To repeat single or multiple values, rep() can be used.


To generate a number sequence (of doubles or integers) , seq()
can be used.
It is also possible to generate an ascending or descending integer
sequence using the colon operator (’:’).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 20 / 85


Vectors Vectorization and recycling

Vectorization and recycling I

Many essential functions in R are vectorized, i.e.:


Functions are called element-wise on single or multiple vectors.
Contrary to other programming languages, it is not necessary to use
loops in this case. They should be avoided when it is possible to use
vectorization.
1:3 + 4:6
## [1] 5 7 9

When using a binary function with input vectors of unequal length, the
shorter vector will be replicated to match the length of the longer
vector → recycling.
1:2 + 3:6
## [1] 4 6 6 8

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 21 / 85


Vectors Vectorization and recycling

Vectorization and recycling II

If the shorter vector’s length is not a divisor of the longer vector’s


length, the shorter vector will not be replicated completely and a
warning will be issued:
1:2 + 1:5
## Warning in 1:2 + 1:5: longer object length is not a multiple of shorter
object length
## [1] 2 4 4 6 6

Paradigm 4
Many functions in (base) R are vectorized.
For vector inputs of different length, the shorter vector will be replicated to
the length of the longer vector.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 85


Vectors Vectorization and recycling

Vectorization and recycling III


This result does not surprise us:
c(TRUE, FALSE) & TRUE

## [1] TRUE FALSE

Then, what is the result of this call?


c(TRUE, FALSE) && TRUE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 85


Vectors Vectorization and recycling

Vectorization and recycling III

This result does not surprise us:


c(TRUE, FALSE) & TRUE

## [1] TRUE FALSE

Then, what is the result of this call?


c(TRUE, FALSE) && TRUE

## [1] TRUE

&& and || are examples for non-vectorized functions.


In this case, R only considers the first element of each vector.
R does not even issue a warning when doing this!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 85


Vectors Subsetting

Subsetting - all the different ways


It is possible to subset single or multiple elements in R using square
brackets ’[’, ’]’. There are in total six different ways to subset vectors.
What are they?
vec <- c(one = 1, two = 2, three = 3)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 24 / 85


Vectors Subsetting

Subsetting - all the different ways


It is possible to subset single or multiple elements in R using square
brackets ’[’, ’]’. There are in total six different ways to subset vectors.
What are they?
vec <- c(one = 1, two = 2, three = 3)

1 Positive integers: 4 Nothing (Blank):


vec[c(2, 1)] vec[]
## two one ## one two three
## 2 1 ## 1 2 3
2 Negative integers: 5 Zero:
vec[-c(2, 1)]
vec[0]
## three
## named numeric(0)
## 3
3 Logical values: 6 Using the names attribute:
vec[c(TRUE, FALSE, TRUE)] vec[c("three", "one")]
## one three ## three one
## 1 3 ## 3 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 24 / 85


Vectors Subsetting

Subsetting - Properties I

Positive and negative integers can’t be mixed:


vec[c(2, -1)]
## Error in vec[c(2, -1)]: only 0’s may be mixed with negative subscripts

Subsetting with an NA returns an NA.

vec[c(1:2, NA)] vec[c(FALSE, TRUE, NA)]


## one two <NA> ## two <NA>
## 1 2 NA ## 2 NA

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 25 / 85


Vectors Subsetting

Subsetting - Properties II
What is the result of this call and how can it be explained?
vec[c(TRUE, NA)]

What are the results of these calls and how can they be explained?

vec[NA] vec[NA_integer_]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 85


Vectors Subsetting

Subsetting - Properties II
What is the result of this call and how can it be explained?
vec[c(TRUE, NA)]
## one <NA> three
## 1 NA 3

When subsetting with logical values, the logical vector is replicated to


match the length of the vector being subsetted.
What are the results of these calls and how can they be explained?

vec[NA] vec[NA_integer_]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 85


Vectors Subsetting

Subsetting - Properties II
What is the result of this call and how can it be explained?
vec[c(TRUE, NA)]
## one <NA> three
## 1 NA 3

When subsetting with logical values, the logical vector is replicated to


match the length of the vector being subsetted.
What are the results of these calls and how can they be explained?

vec[NA] vec[NA_integer_]
## <NA> <NA> <NA> ## <NA>
## NA NA NA ## NA

It’s the same phenomenon as above: A logical vector is replicated to


the length of the vector being subsetted whereas this does not happen
when subsetting with integers.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 85
Vectors Subsetting

Subsetting - Properties III


What happens here?
vec["on"]

What about here?


vec2 <- c(a = 1, a = 2)
vec2["a"]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 85


Vectors Subsetting

Subsetting - Properties III


What happens here?
vec["on"]
## <NA>
## NA

When subsetting with the names attribute, the names have to


specified accurately. There is no ’partial matching’.
What about here?
vec2 <- c(a = 1, a = 2)
vec2["a"]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 85


Vectors Subsetting

Subsetting - Properties III


What happens here?
vec["on"]
## <NA>
## NA

When subsetting with the names attribute, the names have to


specified accurately. There is no ’partial matching’.
What about here?
vec2 <- c(a = 1, a = 2)
vec2["a"]
## a
## 1

In the case of identical names, only the first element with the
respective name is returned.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 85


Vectors Subsetting

Subsetting - Properties IV

Two final riddles:


What happens here and why?
vec[4]

And what happens here and why?


vec[-4]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 85


Vectors Subsetting

Subsetting - Properties IV

Two final riddles:


What happens here and why?
vec[4]
## <NA>
## NA

Subsetting with a positive number whose index does not exist always
leads to an NA.
And what happens here and why?
vec[-4]

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 85


Vectors Subsetting

Subsetting - Properties IV

Two final riddles:


What happens here and why?
vec[4]
## <NA>
## NA

Subsetting with a positive number whose index does not exist always
leads to an NA.
And what happens here and why?
vec[-4]
## one two three
## 1 2 3

When dropping an index which does not exist, nothing can be dropped.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 85


Vectors Subsetting

Subsetting - Summary I

The six different subsetting possibilities have the following properties:

Positive and negative integers


Positive or negative integers denote indices of the vectors that are to
be returned or dropped respectively,
An index vector can be of arbitrary length,
Indices can occur repeatedly,
Positive and negative indices can’t be mixed,
Zeroes can be added to positive and negative indices alike - they have
no effect however,
Non-integers are always rounded down. As such, x[3.9] is the same
as x[3] and x[-4.7] the same as x[-4].

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 29 / 85


Vectors Subsetting

Subsetting - Summary II

Logical values
A logical index vector specifies all positions to be subsetted.
A logical index vector must have the same length as the vector to be
subsetted.
If it does not, it is replicated accordingly.
If the index vector is longer than the vector being subsetted, only the
first elements of the index vector are considered.
Nothing
An blank index returns all elements.
Zero
Subsetting with zero returns a vector of length 0.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 85


Vectors Subsetting

Subsetting - Summary III

names attribute
Elements with a specific name can be subsetted.
Just like with integers, vector components can be subsetted repeatedly.
The index vector can be of arbitrary length.
Names must be specified exactly.
In the case of equal names, only the first element with the respective
name is returned:
x <- c(a = 1, a = 2)
x[c("a", "a")]
## a a
## 1 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 31 / 85


Factors

Factors

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 32 / 85


Factors

Factors - What’s that?

factors have been implemented to express character vectors in a


more memory efficient way.
However, since character vectors are stored as described in the
chapter on data types, characters are actually more memory
efficient.
Internally, factors consist of an integer vector with a levels
attribute and the class attribute factor.
The levels attribute contains all possible (or at least occurring)
values a variable can take. If not specified otherwise, levels are
sorted according to the Unicode Collation Algorithm (see Chapter 2.1).
Despite being nominal/ordinal, the data is encoded using integers that
specify the index of the ’original’ value in the levels attribute.
When working with ordinal data, the ordered class should be
preferred.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 33 / 85
Factors

Working with factors I


Generating a factor with the command factor():
stat1 <- factor(strsplit("statistics", split = "")[[1]])
stat1
## [1] s t a t i s t i c s
## Levels: a c i s t
Specifying all possible levels manually:
stat2 <- factor(strsplit("statistics", split = "")[[1]], levels = letters)
stat2
## [1] s t a t i s t i c s
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

Contrary to the names attribute, no level may occur twice.


Always make sure that really all possible levels are specified:
stat3 <- factor(strsplit("Statistics", split = "")[[1]], levels = letters)
stat3
## [1] <NA> t a t i s t i c s
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 34 / 85


Factors

Working with factors II

Unutilized levels can be dropped using the command droplevels():


droplevels(stat2)
## [1] s t a t i s t i c s
## Levels: a c i s t

In principle, factors can be concatenated using c(), but the result is


not really what’s intended (only the integer values are concatenated):
data1 <- factor(strsplit("datascience", split = "")[[1]],
levels = letters)
c(stat2, data1)
## [1] s t a t i s t i c s d a t a s c i e n c e
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

There is no ’elegant’ way of combining factors in base R:


factor(c(as.character(stat2), as.character(data1)), levels = letters)
## [1] s t a t i s t i c s d a t a s c i e n c e
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 35 / 85


Factors

Working with factors III

Since factors are ’only’ vectors, they can be subsetted accordingly. In


that case, factors and all levels are preserved:
stat1[1:2]
## [1] s t
## Levels: a c i s t

If redundant levels are to be dropped during the process of subsetting,


the drop argument must be set to TRUE:
stat1[1:2, drop = TRUE]
## [1] s t
## Levels: s t

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 36 / 85


Factors

Working with factors IV

Beware when trying to convert the original values of a factor into


another data type:
In this case, everything works out:
x <- factor(c("One", "Two"))
as.character(x)
## [1] "One" "Two"
Here, caution is advised. That’s not how it’s done:
x <- factor(c(0, 1))
as.integer(x)
## [1] 1 2

It’s always ’safer’ to convert the values into a character first:


as.integer(as.character(x))
## [1] 0 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 37 / 85


Factors

And the moral of the story...?

Factors are annoying:


Simple functions (e.g. c()) don’t work at all or not as intended.
Dealing with the levels attribute can be a struggle.
One should always be aware of the fact that factors are integers
internally and as such, usually (but not always) behave accordingly.
Factors were supposed to store categorical data more efficiently.
However, this is not the case anymore.
Even though many users might want factors gone for good, this is not
feasible for several reasons:
Many functions expect factors as input or return them as output.
Character strings in data sets are stored as factors by default.
Looking hard enough, there may actually be some positive aspects to
factors after all, like considering possible values that did not occur in
the data (e.g. when using table()).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 38 / 85


Matrices

Matrices

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 39 / 85


Matrices

Matrices - General remarks I

Matrices can be generated using the command matrix() (among others).


If no data is supplied, the matrix is filled with NAs.
It’s possible to speficy row and column dimensions.
If only one of them is specified, R attempts to infer the other one from
the data length.
If the data length does not match the dimensions, the data vector is
recycled accordingly.
If neither row nor column dimensions are specified, a single column
matrix will be returned.
By default, data is inserted into the matrix by column.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 40 / 85


Matrices

Matrices - General remarks II

Matrices are ’only’ atomic vectors with a two-dimensional dimension


attribute. It’s possible to add this to a vector and receive a matrix:
vec <- 1:6
dim(vec) <- c(2, 3)
vec
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6

Matrices are always stored by column - even if the values are inserted
by row.
Since matrices are only vectors, they can only contain data of a single
data type as well.
Single rows and/or columns of a matrix can be named.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 41 / 85


Matrices

Matrix generation: Recycling I

Here, everything is fine (4 entries in 2 rows, R determines that 2


columns are required and inserts the values into a 2 × 2 matrix):
matrix(1:4, nrow = 2)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

In this case, a warning would be desirable:


matrix(1:2, nrow = 2, ncol = 2)
## [,1] [,2]
## [1,] 1 1
## [2,] 2 2

After all, the data length does clearly not fit the specified dimensions.
But, since a multiple of the data length matches the dimensions, no
warning is issued.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 85


Matrices

Matrix generation: Recycling II

At least, a warning is issued when the number of matrix entries is not


a multiple of the data length:
matrix(1:3, nrow = 2, ncol = 2)
## Warning in matrix(1:3, nrow = 2, ncol = 2): data length [3] is not a
sub-multiple or multiple of the number of rows [2]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 1

If one dimension is left unspecified, it is set in a way that all entries fit
into the matrix once and a warning is issued:
matrix(1:5, nrow = 2)
## Warning in matrix(1:5, nrow = 2): data length [5] is not a sub-multiple
or multiple of the number of rows [2]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 43 / 85


Matrices

Matrix generation: Recycling III

However, this does not work since no new object is being generated.
Instead, this is an attempt to modify an already existing object:
x <- 1:5
dim(x) <- c(2, 3)
## Error in dim(x) <- c(2, 3): dims [product 6] do not match the length of
object [5]

If the data length is larger than the number of matrix entries, only the
first data points are used - without issuing a warning!
matrix(1:10, nrow = 2, ncol = 2)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 44 / 85


Matrices

Operations and functions on matrices I

If two matrices of equal dimensions (and a comparable data type), the


same binary operators (+, -, *, /, & ...) as for vectors can be used. In
this case, calculations are performed element-wise.
If these operations are performed using a matrix and a vector, the
vector is recycled to match the length of the matrix.
There are special operators for matrices like %*% (matrix
multiplication), %o% (outer product) and %x% (Kronecker product).
Furthermore, there are many additional functions for matrices, e.g.
diag() and crossprod().

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 45 / 85


Matrices

Operations and functions on matrices II

There are multidimensional generalizations of the vector commands


length() and names():
For a matrix, length() returns the length of the entire vector. Row
and column dimensions can be retrieved with nrow() and ncol()
respectively.
To obtain the names of rows and columns of a matrix, the commands
rownames() and colnames() can be used.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 46 / 85


Matrices

Operations and functions on matrices III

Calculations over columns are much more efficient than calculations over
rows:
library(microbenchmark)
set.seed(1908)
rnd <- rnorm(1e3 * 1e3)
f1 <- function() matrix(rnd, nrow = 1e3)
f2 <- function() matrix(rnd, nrow = 1e3, byrow = TRUE)

# Example 1: Generating a matrix


microbenchmark(f1(), f2())

## Unit: milliseconds
## expr min lq mean median uq max neval
## f1() 1.613101 1.775902 2.765082 1.961400 2.342901 14.4933 100
## f2() 4.868101 5.195851 6.582729 5.477101 6.774001 16.3230 100

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 47 / 85


Matrices

Operations and functions on matrices IV

# Example 2: Calculate column and row sums


mat <- matrix(rnd, nrow = 1e3)
f3 <- function() colSums(mat)
f4 <- function() rowSums(mat)
microbenchmark(f3(), f4())

## Unit: microseconds
## expr min lq mean median uq max neval
## f3() 851.001 902.501 986.250 946.3005 1012.901 1772.602 100
## f4() 1965.001 2019.501 2163.112 2091.8015 2227.451 2914.700 100

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 48 / 85


Matrices

Subsetting matrices I

Generally, matrices are vectors and as such, they can be subsetted in


the same six ways.
Subsetting with an one-dimensional index returns the corresponding
element of the underlying vector:
mat <- matrix(1:6, nrow = 2)
mat[5]
## [1] 5
mat <- matrix(1:6, nrow = 2, byrow = TRUE)
mat[5]
## [1] 3

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 49 / 85


Matrices

Subsetting matrices II

Matrices can also be subsetted with a two-dimensional index:

mat[1:2, 3] mat[1:2, -3]


## [1] 3 6 ## [,1] [,2]
## [1,] 1 2
## [2,] 4 5

This returns a vector if only a single row and/or column is subsetted.


In all other cases a matrix is returned.
To obtain a single row or single column matrix instead of a vector, the
drop argument must be set to FALSE.
mat[1:2, 3, drop = FALSE]
## [,1]
## [1,] 3
## [2,] 6

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 85


Matrices

Subsetting matrices III

The six familiar subsetting methods can be applied to two-dimensional


subsetting as well - it’s also possible to subset each dimension in a
different way.
To subset all rows or columns, the corresponding index can be left
blank (cf. the blank subsetting for vectors):

mat[, 1:2] mat[, ]


## [,1] [,2] ## [,1] [,2] [,3]
## [1,] 1 2 ## [1,] 1 2 3
## [2,] 4 5 ## [2,] 4 5 6

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 51 / 85


Matrices

Subsetting matrices IV

Furthermore, it’s possible to subset matrices with another matrix:


mat
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
index <- matrix(c(1, 1, 2, 3), ncol = 2)
index
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
mat[index]
## [1] 2 3

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 52 / 85


Matrices

Subsetting matrices V

What is the explanation for this result?


The index matrix is read by rows.
Every row of the index matrix is interpreted as an index pair (row,
column).
The command above therefore leads to the same result as
c(mat[1, 2], mat[1, 3]).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 53 / 85


Arrays

Arrays

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 85


Arrays

Arrays - The generalization of matrices


array(1:12, dim = c(2, 3, 2))

There is not much to say about arrays: ## , , 1


##
arrays are vector-based data ## [,1] [,2] [,3]
structures of dimensionality n ∈ N. ## [1,] 1 3 5
## [2,] 2 4 6
Just like there are one-dimensional ##
## , , 2
matrices, there also are ##
one-dimensional arrays. ## [,1] [,2] [,3]
## [1,] 7 9 11
A two-dimensional array is a matrix. ## [2,] 8 10 12
arrays can be generated using the class(array(1:12, dim = 12))
command array() where the desired
## [1] "array"
dimensionality can be specified with
the dim argument. class(array(1:12, dim = c(6, 2)))

## [1] "matrix" "array"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 55 / 85


Lists

Lists

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 56 / 85


Lists

Lists - what are those?

Internally, lists belong the data types in R. Intuitively, they rather


resemble a data structure.
Lists are also vectors in R.
Contrary to the atomic vectors, lists can store multiple different data
types at once (’heterogeneous vector’).
Furthermore, lists can consist of lists themselves (’recursive data
structure’).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 57 / 85


Lists Generation

Generating lists I

The list() command:


... behaves similarly to the c() funktion for atomic vectors.
... takes (named) entries:
list(a = "Hello", b = matrix(1:4, nrow = 2), c = list(1, 2))
## $a
## [1] "Hello"
##
## $b
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## $c
## $c[[1]]
## [1] 1
##
## $c[[2]]
## [1] 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 58 / 85


Lists Generation

Generating lists II

The vector() command:


The vector() command can be used to generate an empty list of
arbitrary length:
vector(mode = "list", length = 3)
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL

cf. vector() command for atomic vectors → it’s the same function.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 59 / 85


Lists Subsetting

Subsetting lists

There are three ways of subsetting lists:


With single square brackets ’[’,
With double square brackets ’[[’,
With the $ operator.
We will now examine the differences between these subsetting variants:
set.seed(5)
li <- list(a = matrix(rnorm(4), nrow = 2), b = matrix(rnorm(4), nrow = 2),
c = matrix(rnorm(4), nrow = 2))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 60 / 85


Lists Subsetting

Subsetting lists - The single square brackets I

We are already familiar with the square brackets from subsetting


vectors.
They have the same functionality for lists:
All six subsetting methods for vectors work.
Multiple list elements can be subsetted at once.
The output is always a list.

# Positive integers # Logical values


li[2] li[c(FALSE, FALSE, TRUE)]

## $b ## $c
## [,1] [,2] ## [,1] [,2]
## [1,] 1.711441 -0.4721664 ## [1,] -0.2857736 1.2276303
## [2,] -0.602908 -0.6353713 ## [2,] 0.1381082 -0.8017795

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 61 / 85


Lists Subsetting

Subsetting lists - The single square brackets II


# Negative integers # Subsetting with 0
li[-(1:2)] li[0]

## $c ## named list()
## [,1] [,2]
## [1,] -0.2857736 1.2276303 # Nothing / Blank
## [2,] 0.1381082 -0.8017795 li[]

# Using the names attribute ## $a


li[c("a", "c")] ## [,1] [,2]
## [1,] -0.8408555 -1.25549186
## $a ## [2,] 1.3843593 0.07014277
## [,1] [,2] ##
## [1,] -0.8408555 -1.25549186 ## $b
## [2,] 1.3843593 0.07014277 ## [,1] [,2]
## ## [1,] 1.711441 -0.4721664
## $c ## [2,] -0.602908 -0.6353713
## [,1] [,2] ##
## [1,] -0.2857736 1.2276303 ## $c
## [2,] 0.1381082 -0.8017795 ## [,1] [,2]
## [1,] -0.2857736 1.2276303
## [2,] 0.1381082 -0.8017795

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 62 / 85


Lists Subsetting

Subsetting lists - The double squared brackets I

The double squared brackets can be used to subset single elements of


a list.
It’s not possible to subset multiple elements at once.
The output is the list entry itself and not a list containing a single
element (like for the single squared brackets)
The squared brackets may contain a positive integer or a name. The
other subsetting methods for vectors don’t work here:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 63 / 85


Lists Subsetting

Subsetting lists - The double squared brackets II

li[[2]] li[["b"]]

## [,1] [,2] ## [,1] [,2]


## [1,] 1.711441 -0.4721664 ## [1,] 1.711441 -0.4721664
## [2,] -0.602908 -0.6353713 ## [2,] -0.602908 -0.6353713

li[[-1]] li[[0]]

## Error in li[[-1]]: invalid negative ## Error in li[[0]]: attempt to select


subscript in get1index <real> less than one element in get1index
<real>
li[[c(TRUE, FALSE, FALSE)]]
li[[]]
## Error in li[[c(TRUE, FALSE,
FALSE)]]: recursive indexing failed at ## NULL
level 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 64 / 85


Lists Subsetting

Subsetting lists - The $ operator

The $ operator provides almost the same functionality as the double


squared brackets:
Only one element can be subsetted at a time.
The ouput is always the element itself and not a list of length 1.
However, there are some differences:
The $ operator only works with names and not with indices.
The $ operator does not work with variable names.
’Partial matching’ is allowed.

li$c

## [,1] [,2]
## [1,] -0.2857736 1.2276303
## [2,] 0.1381082 -0.8017795

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 65 / 85


Lists Subsetting

Partial matching I

Partial matching means that using the unique beginning of a name suffices
for subsetting:
li2 <- list(abc = 2, def = 7, deg = 10)

Partial matching works when using the $ operator:


li2$ab
## [1] 2

By default, it does not when using double squared brackets:


li2[["ab"]]
## NULL
This can be adjusted for, however:
li2[["ab", exact = FALSE]]
## [1] 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 66 / 85


Lists Subsetting

Partial matching II

As for single square brackets, partial matching generally can’t be used


here:
li2["ab"]
## $<NA>
## NULL

Partial matching only works when the index is unique:


li2$de
## NULL

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 67 / 85


Lists Expanding and merging lists

Adding new elements I

All three subsetting variants can also be used to append new elements to
an already existing list:
As for the $ operator and the double squared brackets, new elements
can be appended by assigning them to unoccupied names or indices:
li3 <- list(a = 5, b = 4)
li3$d <- "Hello"
li3[[5]] <- FALSE
str(li3)
## List of 5
## $ a: num 5
## $ b: num 4
## $ d: chr "Hello"
## $ : NULL
## $ : logi FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 68 / 85


Lists Expanding and merging lists

Adding new elements II

When using the single squared brackets, new elements must be part of
a list:
li3[c("e", "f")] <- list(0, 1)
str(li3)
## List of 7
## $ a: num 5
## $ b: num 4
## $ d: chr "Hello"
## $ : NULL
## $ : logi FALSE
## $ e: num 0
## $ f: num 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 69 / 85


Lists Expanding and merging lists

Deleting elements from a list

The three subsetting methods can also be used to delete list elements. In
order to do that, the respective list element must be set to NULL:
li3$f <- NULL
li3[[6]] <- NULL
li3[c("b", "d")] <- NULL
li3

## $a
## [1] 5
##
## [[2]]
## NULL
##
## [[3]]
## [1] FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 70 / 85


Lists Expanding and merging lists

Particularities for appending and deleting I

There are some particularities to be aware of:


When appending an element to an index with
index > length(list) + 1, list entries containing NULL will be
appended in between:
li4 <- list(a = 1)
li4[[3]] <- 3
li4
## $a
## [1] 1
##
## [[2]]
## NULL
##
## [[3]]
## [1] 3

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 71 / 85


Lists Expanding and merging lists

Particularities for appending and deleting II

Deleting a list element is possible by setting it to NULL:


li4$a <- NULL
li4
## [[1]]
## NULL
##
## [[2]]
## [1] 3

If it is of interest to set a list element to NULL without actually


deleting it, it works like this:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 72 / 85


Lists Expanding and merging lists

Particularities for appending and deleting II

Deleting a list element is possible by setting it to NULL:


li4$a <- NULL
li4
## [[1]]
## NULL
##
## [[2]]
## [1] 3

If it is of interest to set a list element to NULL without actually


deleting it, it works like this:
li4[2] <- list(NULL)
li4
## [[1]]
## NULL
##
## [[2]]
## NULL

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 72 / 85


Lists Expanding and merging lists

Merging two lists I

How can two lists be merged?


li5 <- list(a = 1)
li6 <- list(a = "a")

The list() command does not return the desired result:


list(li5, li6)

## [[1]]
## [[1]]$a
## [1] 1
##
##
## [[2]]
## [[2]]$a
## [1] "a"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 73 / 85


Lists Expanding and merging lists

Merging two lists II

Solution: the c() command:


c(li5, li6)

## $a
## [1] 1
##
## $a
## [1] "a"

Side note: As we see here, names don’t have to be unique for lists either!
Subsetting with the non-unique name always returns the first element:
c(li5, li6)$a

## [1] 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 74 / 85


Lists Shenanigans with lists

Lists are vectors! I

Since lists are vectors, we can do some ’interesting’ things with them. For
example, we can place a dimensions attribute:
li7 <- list(a = matrix(rnorm(4), nrow = 2), b = matrix(rnorm(4), nrow = 2),
c = matrix(rnorm(4), nrow = 2), d = matrix(rnorm(4), nrow = 2))
dim(li7) <- c(2, 2)
li7

## [,1] [,2]
## [1,] numeric,4 numeric,4
## [2,] numeric,4 numeric,4

Now, we have a 2 × 2 matrix where each entry is another 2 × 2 matrix.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 75 / 85


Lists Shenanigans with lists

Lists are vectors! II

This is also possible:


li8 <- list("Hello", 5, FALSE, 3+5i)
dim(li8) <- c(2, 2)
li8

## [,1] [,2]
## [1,] "Hello" FALSE
## [2,] 5 3+5i

Now, we have a matrix that con-


tains different data types. But that
shouldn’t be possible! Why does
this work?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 76 / 85


Lists Shenanigans with lists

Lists are vectors! II

The object is a matrix...


This is also possible:
class(li8)
li8 <- list("Hello", 5, FALSE, 3+5i) ## [1] "matrix" "array"
dim(li8) <- c(2, 2)
li8
... the underlying data type, however,
## [,1] [,2]
## [1,] "Hello" FALSE is a list:
## [2,] 5 3+5i
typeof(li8)

Now, we have a matrix that con- ## [1] "list"


tains different data types. But that
shouldn’t be possible! Why does Recalling: Lists are a data type in R
this work? and not a data structure. That’s why
such matrices are possible.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 76 / 85


Lists Shenanigans with lists

Lists are vectors! III

Beware!
Just because we can play around like this with lists, it does not mean
that we should use such data structures!
Many functions in R are not designed to work with such complex data
structures.
Besides leading to incomprehensible code, the use of such data
structures can also result in errors that are very hard to find!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 77 / 85


data.frames

data.frames

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 78 / 85


data.frames

data.frames - The data structure for datasets

data.frames allow storing datasets in R.


Internally, data.frames are just lists with elements of equal length:

class(iris) typeof(iris)
## [1] "data.frame" ## [1] "list"

data.frames are supposed to be a combination of lists and matrices -


they share the functionality of both these data structures:
data.frames have names (just like lists) as well as rownames and
colnames (just like matrices). names and colnames are identical.
data.frames have a length (length) as well as a number of rows
(nrow) and columns (ncol). length and ncol are identical.
data.frames can be subsetted one-dimensionally with [, [[ or $, or
two-dimensionally like matrices with [,].

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 79 / 85


data.frames Verbinden zweier Datensätze

Merging two datasets I

Datasets can be merged/expanded using cbind() or rbind(). Note that:


When merging by column (cbind()), both datasets must have the
same number of rows. The rownames don’t have to be identical,
however.
When merging by row (rbind()), both datasets must have the same
number of columns and their colnames must be identical.
rbind() and cbind() can also be used to append vectors or matrices
to a dataset.
At least one input for rbind() or cbind() respectively must be a
data.frame. Otherwise, rbind() and cbind() might return matrices
with potentially undesired type conversions!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 80 / 85


data.frames Verbinden zweier Datensätze

Merging two datasets II

Examples:

data1 <- data.frame(x = 1:2, y = 3:4) rbind(data1, data2)


cbind(data1, z = 5:6)
## Error in match.names(clabs,
## x y z names(xi)): names do not match previous
## 1 1 3 5 names
## 2 2 4 6
cbind(a = c("a", "b"), b = 1:2)
data2 <- data.frame(a = c("A", "B"),
b = c(TRUE, FALSE)) ## a b
cbind(data1, data2) ## [1,] "a" "1"
## [2,] "b" "2"
## x y a b
## 1 1 3 A TRUE
## 2 2 4 B FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 81 / 85


data.frames Shenanigans with data.frames

data.frames are lists and therefore vectors...

Every column of a data.frame is an element of the underlying list.


Every list element consists of a vector of equal length.
This vector can also be another list. And this list can then have entries
of different length:

data3 <- data.frame(x = 1:3, y = 2:4)


data3$z <- list(1, 1:2, 1:3)
data3

## x y z
## 1 1 2 1
## 2 2 3 1, 2
## 3 3 4 1, 2, 3

Beware! Just like when we were playing around with lists, caution is
advised here as well! Just because something technically works, does not
mean you should use it!
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 82 / 85
Further data structures

Further data structures

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 83 / 85


Further data structures

Further data structures in R I

Aside from the already mentioned data structures, there are also further
though less frequently used data structures. They shall not be discussed
here in detail.
Two examples:
The contingency table:
tab <- table(sample(10, 100, replace = TRUE))
tab
##
## 1 2 3 4 5 6 7 8 9 10
## 11 7 12 13 9 11 6 10 14 7
class(tab)
## [1] "table"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 84 / 85


Further data structures

Further data structures in R II

The distance matrix dist:


dis <- dist(1:10)
dis
## 1 2 3 4 5 6 7 8 9
## 2 1
## 3 2 1
## 4 3 2 1
## 5 4 3 2 1
## 6 5 4 3 2 1
## 7 6 5 4 3 2 1
## 8 7 6 5 4 3 2 1
## 9 8 7 6 5 4 3 2 1
## 10 9 8 7 6 5 4 3 2 1
class(dis)
## [1] "dist"

Generally, it is possible to build your own data structures. But chances are
they will be based on the data structures presented here.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 85 / 85

You might also like