Beautiful Graphics in R
Beautiful Graphics in R
Imagine Communi-
cate and
Transform
ggplot2, data Audience
visualization 1 2 3
Alboukadel Kassambara
Guide to Create
Beautiful Graphics in R
sthda.com Edition 21
A. Kassambara
2015
Order a physical copy from amazon at
https://goo.gl/Pz3Neg
A. Kassambara 2
2015
2
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, without the prior
written permission of the Publisher. Requests to the Publisher for permission should
be addressed to STHDA (http://www.sthda.com).
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials.
Neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage
to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operation of any
methods, products, instructions, or ideas contained in the material herein.
Introduction to R (chapter 1)
Area plot (chapter 3)
Frequency polygon (chapter 7)
Dot plot for one variable (chapter 8)
Scatter plot (chapter 12)
quantile line from quantile regression
jitter to reduce overplotting
Continuous bivariate distribution (chapter 13)
Correlation Matrix Visualization (chapter 41)
ggcorrplot: new R package for visualizing a correlation matrix
Line plot with time series data updated
Graphical parameters:
Position adjustements (chapter 38)
Coordinate systems (chapter 39)
Text annotations: ggrepel R package (chapter 34)
survminer: new R package for plotting survival curves with number at risk table
(chapter 42)
2. Removed sections:
Line plot
Add arrow section removed
Legend
Section remove legend slashes (not required since ggplot2 v2)
4
Note that, all the analyses in this book were performed using R (ver. 3.2.3) and
ggplot2 (ver 2.1.0).
0.3 Acknowledgments
Thanks to Leland Wilkinson for the concept,
Thanks to Hadley Wickham for ggplot2 R package
1 Introduction to R 20
1.1 Install R and RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Arithmetics with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Data types in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Getting help with functions in R . . . . . . . . . . . . . . . . . . . . . 24
1.5 Installing and loading R packages . . . . . . . . . . . . . . . . . . . . 24
1.6 Importing your data into R . . . . . . . . . . . . . . . . . . . . . . . 26
1.7 Demo data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Close your R/RStudio session . . . . . . . . . . . . . . . . . . . . . . 28
5
6 CONTENTS
2 Introduction to ggplot2 29
2.1 Whats ggplot2? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Type of graphs for data visualization . . . . . . . . . . . . . . . . . . 30
2.3 Install and load ggplot2 package . . . . . . . . . . . . . . . . . . . . . 31
2.4 Data format and preparation . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 qplot() function: Draw quick plots . . . . . . . . . . . . . . . . . . . . 32
2.6 ggplot() function: Build plots piece by piece . . . . . . . . . . . . . . 35
2.7 Save ggplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Area Plots 43
4 Density Plots 45
4.1 Basic density plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Histogram Plots 49
5.1 Basic histogram plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 51
7 Frequency Polygon 54
9 ECDF Plots 57
10 QQ Plots 58
CONTENTS 7
15 Box Plots 87
15.1 Basic box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
15.2 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 89
15.3 Box plot with multiple groups . . . . . . . . . . . . . . . . . . . . . . 91
16 Violin plots 93
16.1 Basic violin plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
16.2 Add summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 94
16.3 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 95
16.4 Violin plots with multiple groups . . . . . . . . . . . . . . . . . . . . 96
17 Dot Plots 97
17.1 Basic dot plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
17.2 Add summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 98
17.3 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 100
17.4 Dot plot with multiple groups . . . . . . . . . . . . . . . . . . . . . . 101
18 Stripcharts 103
18.1 Basic stripcharts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
18.2 Add summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 104
18.3 Change point shapes by groups . . . . . . . . . . . . . . . . . . . . . 105
18.4 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 106
18.5 Stripchart with multiple groups . . . . . . . . . . . . . . . . . . . . . 108
26 Colors 150
26.1 Use a single color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
26.2 Change colors by groups . . . . . . . . . . . . . . . . . . . . . . . . . 151
26.3 Gradient or continuous colors . . . . . . . . . . . . . . . . . . . . . . 156
count
F 20 F
0.04
M M
0.02 10
0.00 0
40 50 60 70 80 40 50 60 70 80
Weight weight
Empirical Cumulative QQ-Plot
Density Function 35
Miles/(US) gallon
1.00
30
cyl
0.75
F(weight)
25 4
0.50 20 6
0.25 8
15
0.00 10
52 54 56 58 60 -1 0 1
Weight theoretical
Part III provides quick-start guides for plotting two continuous/discretes vari-
ables, including :
30 90
cyl
80
waiting
25 4
70
20 6
60
15 8
50
10
2 3 4 5 2 3 4 5
Weight (lb/1000) eruptions
Part IV (chapter 15 - 22 ) describes how to draw and customize: box plots, violin
plots, dot plots, strip charts, line plots, bar plots and pie charts.
Box plot Strip chart
30 30
dose dose
Length
Length
0.5 0.5
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
Dose (mg) Dose (mg)
Dot plot Violin plot
30 30
dose dose
Length
Length
0.5 0.5
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
Dose (mg) Dose (mg)
0.5. HOW THIS BOOK IS ORGANIZED? 15
40 supp supp
20
len
len
OJ OJ
29.5 VC VC
20 15
10
6.8 10
0 4.2
D0.5 D1 D2 D0.5 D1 D2
dose dose
Pie chart
25% group
Child
Female
50% 25%
Male
1.00
++
Survival probability
++
0.75 ++++
+ ++++++
++
++ ++
0.50 ++ +
++ +++ Correlation matrix
+++ ++
0.25 + drat
p = 0.0013 Corr
++ + 1.0
++ ++ mpg
0.00
qsec 0.5
0 250 500 750 1000
0.0
Time wt
-0.5
Number at risk by time disp
-1.0
Strata
sex=1 138 62 20 7 2 hp
sex=2 90 53 21 3 0
0 250 500 750 1000
hp
qs t
ec
pg
at
w
s
Time
dr
di
m
0.6. BOOK WEBSITE 17
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
C
15
psavert
10
5
19701980199020002010
date
Each chapter is organized as an independent quick start guide. This means that,
you dont need to read the different chapters in sequence. I just recommend to read
firstly the chapter 1, as it gives a quick overview of R and ggplot2 graphing system.
For each chapter, the covered ggplot2 key functions are generally mentioned at the
beginning. The used data are described and many examples of R codes and graphics
are provided.
Sometimes, different chapters use the same data. In this case, we decided to repeat
the data preparation description in the corresponding chapters. In other words, each
chapter is an independent module and this gives the possibility to the user to read
only the chapter of interest.
Paste firstly the code in your R code editor or in your text editor
Copy the code from your text/code editor to the R console
Part I
19
Chapter 1
Introduction to R
R is a free and powerful statistical software for analyzing and visualizing data. If
you want to learn easily the essential of R programming, visit our series of tutorials
available on STHDA: http://www.sthda.com/english/wiki/r-basics-quick-and-easy.
In this chapter, we provide a very brief introduction to R, for installing R/RStudio as
well as importing your data into R.
2. After installing R software, install also the RStudio software available at:
http://www.rstudio.com/products/RStudio/.
RStudio screen:
20
1.2. ARITHMETICS WITH R 21
7 + 4 # => 11
7 - 4 # => 3
7 / 2 # => 3.5
7 * 2 # => 14
log2(4) # => 2
abs(-4) # => 4
sqrt(4) # => 2
22 CHAPTER 1. INTRODUCTION TO R
Matrices: like an Excel sheet containing multiple rows and columns. Combina-
tion of multiple vectors with the same types (numeric, character or logical).
Create and naming matrix: matrix(), cbind(), rbind(), rownames(x),
colnames(x)
Convert x to a matrix: x2 <- as.matrix(x)
Dimensions of a matrix: ncol(x), nrow(x), dim(x)
Get a subset of a matrix: my_data[row, col]
Calculations with numeric matrices: rowSums(x), colSums(x),
rowMeans(x), colMeans(x)
# Numeric vectors
col1 <- c(5, 6, 7, 8, 9)
col2 <- c(2, 4, 5, 9, 8)
1.3. DATA TYPES IN R 23
# Create a factor
friend_groups <- factor(c("grp1", "grp2", "grp1", "grp2"))
levels(friend_groups) # => "grp1", "grp2"
## grp1 grp2
## 28.0 25.5
24 CHAPTER 1. INTRODUCTION TO R
Data frames: like a matrix but can have columns with different types
Create a data frame: data.frame()
Convert x to a data frame: x2 <- as.data.frame(x)
Subset a data frame: my_data[row, col]
?mean
For example, in this book, youll learn how to draw beautiful graphs using the ggplot2
R package.
There are thousands other R packages available for download and installation from
CRAN, Bioconductor(biology related R packages) and GitHub repositories.
2. How to install packages from GitHub? You should first install devtools if you
dont have it already installed on your computer:
For example, the following R code installs the latest version of survminer R package
developed by A. Kassambara (https://github.com/kassambara/survminer).
install.packages("devtools")
devtools::install_github("kassambara/survminer")
3. After installation, you must first load the package for using the functions in the
package. The function library() is used for this task.
library("ggplot2")
Use the first row as column names. Generally, columns represent variables
Use the first column as row names. Generally rows represent observations.
Each row/column name should be unique, so remove duplicated names.
Avoid names with blank spaces. Good column names: Long_jump or Long.jump.
Bad column name: Long jump.
Avoid names with special symbols: ?, $, *, +, #, (, ), -, /, }, {, |, >, < etc.
Only underscore can be used.
Avoid beginning variable names with a number. Use letter instead. Good column
names: sport_100m or x100m. Bad column name: 100m
R is case sensitive. This means that Name is different from Name or NAME.
Avoid blank rows in your data
Delete any comments in your file
Replace missing values by NA (for not available)
If you have a column containing date, use the four digit format. Good format:
01/01/2016. Bad format: 01/01/16
We recommend to save your file into .txt (tab-delimited text file) or .csv (comma
separated value file) format.
You can read more about how to import data into R at this link:
http://www.sthda.com/english/wiki/importing-data-into-r
# Loading
data(mtcars)
If you want learn more about mtcars data sets, type this:
?mtcars
To select just certain columns from a data frame, you can either refer to the columns
by name or by their location (i.e., column 1, 2, 3, etc.).
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
## [29] 15.8 19.7 15.0 21.4
# Or use this
mtcars[, 'mpg']
Introduction to ggplot2
According to ggplot2 concept, a plot can be divided into different fundamental parts
: Plot = data + Aesthetics + Geometry.
Aesthetics: is used to indicate x and y variables. It can be also used to control the
color, the size or the shape of points, the height of bars, etc.....
Geometry: corresponds to the type of graphics (histogram, box plot, line plot,
density plot, dot plot, ....)
Two main functions, for creating plots, are available in ggplot2 package :
qplot(): A quick plot function which is easy to use for simple plots.
ggplot(): A more flexible and robust function than qplot for building a plot piece
by piece.
29
30 CHAPTER 2. INTRODUCTION TO GGPLOT2
The output plot can be kept as a variable and then printed at any time using the
function print()
.
After creating plots, two other important functions are:
ggsave("plot.png", width = 5, height = 5): saves the last plot in the current
working directory.
This document describes how to create and customize different types of graphs using
ggplot2. Many examples of code and graphics are provided.
4. Continuous function
5. Error bar
6. Maps
7. Three variables
In the current document well provide the essential ggplot2 functions for drawing
each of these seven data formats.
2.3. INSTALL AND LOAD GGPLOT2 PACKAGE 31
# Installation
install.packages('ggplot2')
# Loading
library(ggplot2)
The data must be a data.frame that contains all the information to make a ggplot.
In the data, columns should be variables and rows should be observations).
## mpg cyl wt
## Mazda RX4 21.0 6 2.620
## Mazda RX4 Wag 21.0 6 2.875
## Datsun 710 22.8 4 2.320
## Hornet 4 Drive 21.4 6 3.215
## Hornet Sportabout 18.7 8 3.440
## Valiant 18.1 6 3.460
32 CHAPTER 2. INTRODUCTION TO GGPLOT2
Other arguments such as main, xlab and ylab can be also used to add main title and
axis labels to the plot.
# Load data
data(mtcars)
# Basic scatter plot
qplot(x = mpg, y = wt, data = mtcars, geom = "point")
6
5
5
4 4
wt
wt
3 3
2
2
1
10 15 20 25 30 35 10 15 20 25 30 35
mpg mpg
The following R code will change the color and the shape of points by groups. The
column cyl will be used as grouping variable. In other words, the color and the shape
of points will be changed by the levels of cyl.
cyl
5 8 5
cyl
4 7 4 4
wt
wt
6 6
3 3
5 8
2 2
4
10 15 20 25 30 35 10 15 20 25 30 35
mpg mpg
Like color, the shape and the size of points can be controlled by a continuous or
discrete variable.
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 3)
## sex weight
## 1 F 53.79293
## 2 F 55.27743
## 3 F 56.08444
# Basic histogram
qplot(weight, data = wdata, geom = "histogram")
30 Density plot
60 0.20
58
weight
count
20 0.15
Density
56 0.10
10
54 0.05
52 0 0.00
F M 52 54 56 58 60 52 54 56 58 60
sex weight Weight (kg)
2.6. GGPLOT() FUNCTION: BUILD PLOTS PIECE BY PIECE 35
35 35
30 30
25 25
mpg
mpg
20 20
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
The function aes_string() can be used as follow:
Note that, some plots visualize a transformation of the original data set. In this
case, an alternative way to build a layer is to use stat_*() functions.
36 CHAPTER 2. INTRODUCTION TO GGPLOT2
In the following example, the function geom_density() does the same as the function
stat_density():
0.20 0.20
0.15 0.15
density
density
0.10 0.10
0.05 0.05
0.00 0.00
52 54 56 58 60 52 54 56 58 60
weight weight
For each plot type, well provide the geom_*() function and the corresponding
stat_*() function (if available).
35
30
25
mpg
20
15
10
2 3 4 5
wt
2.6. GGPLOT() FUNCTION: BUILD PLOTS PIECE BY PIECE 37
In the R code above, the two layers, geom_point() and geom_line(), use the same
data and the same aesthetic mapping provided in the main function ggplot.
Note that, its possible to use different data and mapping for different layers.
35
30
25
mpg
20
15
10
2 3 4 5
wt
log2(mpg) 5.0
4.5
4.0
3.5
As mentioned above, the function aes_string() is used for aesthetic mappings from
string objects. An example is shown below:
Note that, aes_string() is particularly useful when writing functions that create
plots because you can use strings to define the aesthetic mappings, rather than
having to use substitute to generate a call to aes() (see the R function below).
return(p)
}
30
mpg
20
10
2 3 4 5
wt
png("myplot.png")
print(myplot)
dev.off()
Its also possible to make a ggplot and to save it from the screen using the function
ggsave():
40
2.8. DATA FORMAT 41
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 4)
## sex weight
## 1 F 53.79293
## 2 F 55.27743
## 3 F 56.08444
## 4 F 52.65430
The following R code computes the mean value by sex, using dplyr package. First,
the data is grouped by sex and then summarised by computing the mean weight by
groups. The operator %>% is used to combine multiple operations:
library("dplyr")
mu <- wdata %>%
group_by(sex) %>%
summarise(grp.mean = mean(weight))
head(mu)
In the next sections, the data mu well be used for adding mean line on the plots.
42
Area Plots
An area plot is the continuous analog of a stacked bar chart (see Chapter 20).
Key arguments to customize the plot: alpha, color, fill, linetype, size
# Basic plot
# Change line and fill colors
a + geom_area(stat = "bin",
color= "black", fill="#00AFBB")
30
20
count
10
0
52 54 56 58 60
weight
43
44 CHAPTER 3. AREA PLOTS
Note that, by default y axis corresponds to the count of weight values. If you want
to change the plot in order to have the density on y axis, the R code would be as
follow.
The following plots compares bar plots and area plots. The diamonds data set [in
ggplot2 package] is used.
# Bar plot
p + geom_bar(stat = "bin")
# Area plot
p + geom_area(stat = "bin")
cut cut
10000 Fair 10000 Fair
Good Good
count
count
0 0
0 5000 100001500020000 0 5000 100001500020000
price price
Chapter 4
Density Plots
Key arguments to customize the plot: alpha, color, fill, linetype, size
# Basic plot
a + geom_density()
45
46 CHAPTER 4. DENSITY PLOTS
0.20 0.20
0.15 0.15
density
density
0.10 0.10
0.05 0.05
0.00 0.00
52 54 56 58 60 52 54 56 58 60
weight weight
density
density
density
density
# Fill manually
a3 <- a + geom_density(aes(fill = sex), alpha = 0.4) + theme_minimal()
a3 + scale_fill_manual(values=c("#999999", "#E69F00"))
density
density
0.2 F 0.2 F 0.2 F
M M M
0.1 0.1 0.1
Histogram Plots
Key arguments to customize the plot: alpha, color, fill, linetype, size
49
50 CHAPTER 5. HISTOGRAM PLOTS
# Basic plot
a + geom_histogram()
30 20 30
15
20 20
count
count
10 count
10 10
5
0 0 0
52 54 56 58 60 52 54 56 58 60 52 54 56 58 60
weight weight weight
Note that by default, stat_bin uses 30 bins - this might not be good default. You
can change the number of bins (e.g.: bins = 50 or the bin width e.g.: binwidth
= 0.5.
By default y axis corresponds to the count of weight values. If you want to change
the plot in order to have the density on y axis, the R code would be as follow.
a + geom_histogram(aes(y = ..density..))
5.2. CHANGE COLORS BY GROUPS 51
You can change the position adjustment to use for overlapping points on the layer.
Possible values for the argument position are "identity", "stack", "dodge". Default
value is "stack".
30 30
20 sex 20 sex
count
count
F F
M M
10 10
0 0
52 54 56 58 60 52 54 56 58 60
weight weight
52 CHAPTER 5. HISTOGRAM PLOTS
30
20 sex
count
F
M
10
0
52 54 56 58 60
weight
As described in the density plot chapter (Chapter 4), line and fill colors can be changed
manually as follow:
30 30
20 sex 20 sex
count
count
F F
M M
10 10
0 0
52 54 56 58 60 52 54 56 58 60
weight weight
# Color by groups
a + geom_histogram(aes(y=..density.., color = sex, fill = sex),
alpha=0.5, position="identity")+
geom_density(aes(color = sex), size = 1)
0.2 0.4
sex
density
density
F
0.1 0.2 M
0.0 0.0
52 54 56 58 60 52 54 56 58 60
weight weight
53
Chapter 7
Frequency Polygon
Frequency polygon is very close to histogram plots (Chapter: 5). It can be also used to
visualize the distribution of a continuous variable. The difference between histograms
and frequency polygon is that:
54
55
# Basic plot
a + geom_freqpoly(bins = 30) +
theme_minimal()
30 30
sex
count
20 count 20
F
10 10 M
0 0
52 54 56 58 60 52 54 56 58 60
weight weight
If you want to change the plot in order to have the density on y axis, the R code
would be as follow.
a + geom_freqpoly(aes(y = ..density..))
Chapter 8
In a dot plot, dots are stacked with each dot representing one observation. The
width of a dot corresponds to the bin width.
Key arguments to customize the plot: alpha, color, fill and dotsize
1.00
0.75
sex
count
0.50 F
M
0.25
0.00
52 54 56 58 60
weight
56
Chapter 9
ECDF Plots
ECDF (Empirical Cumulative Density Function) reports for any given number the
percent of individuals that are below that threshold.
Key arguments to customize the plot: alpha, color, linetype and size
a + stat_ecdf(geom = "point")
a + stat_ecdf(geom = "step")
1.00 1.00
0.75 0.75
0.50 0.50
y
0.25 0.25
0.00 0.00
52 54 56 58 60 52 54 56 58 60
weight weight
For any value, say, height = 50, you can see that about 25% of our individuals are
shorter than 50 inches.
57
Chapter 10
QQ Plots
QQ-plots (or Quantile - Quantile plots) are used to check whether a given data
follows normal distribution. The function stat_qq() or qplot() can be used to create
qq-plots.
Key arguments to customize the plot: alpha, color, shape and size
data(mtcars)
# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars[, c("mpg", "cyl")])
## mpg cyl
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
58
59
Create qq plots
# Basic plot
p + stat_qq()
35 35
30 30
cyl
sample
sample
25 25 4
20 20 6
8
15 15
10 10
-2 -1 0 1 2 -1 0 1
theoretical theoretical
Read more on ggplot2 colors here: Chapter 26
Chapter 11
The function geom_bar() can be used to visualize one discrete variable. In this case,
the count of each level is plotted.
Key arguments to customize the plot: alpha, color, fill, linetype and size
Well use the mpg data set [in ggplot2 package]. The R code is as follow:
data(mpg)
ggplot(mpg, aes(fl)) +
geom_bar(fill = "steelblue")+ theme_minimal()
150
count
100
50
0
c d e p r
fl
60
Part III
61
Chapter 12
data(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars[, c("wt", "mpg", "cyl")], 3)
## wt mpg cyl
## Mazda RX4 2.620 21.0 6
## Mazda RX4 Wag 2.875 21.0 6
## Datsun 710 2.320 22.8 4
62
12.3. BASIC SCATTER PLOTS 63
wt wt wt wt
b + geom_jitter() b + geom_text()
Toyota Corolla
Fiat 128
Honda Civic
Hornet
Mazda 4 Drive
RX4
Ferrari Dino
Pontiac
ValiantFirebird
Merc 450SE
Lincoln Continental
wt wt
Key arguments to customize the plot: alpha, color, fill, shape and size
The color, the size and the shape of points can be changed using the function
geom_point() as follow :
64 CHAPTER 12. SCATTER PLOTS: CONTINUOUS X AND Y
Note that, the size of the points can be also controlled by the values of a continuous
variable.
35 35
30 30
25 25
mpg
mpg
20 20
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
35
30
qsec
25 15.0
mpg
17.5
20 20.0
22.5
15
10
2 3 4 5
wt
Read more on point shapes: Chapter 27
Its possible to use the function geom_text() for adding point labels:
b + geom_point() +
geom_text(label=rownames(mtcars), nudge_x = 0.5)
35 35
30 30
cyl cyl
25 4 25 4
mpg
mpg
20 6 20 6
8 8
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
12.4. SCATTER PLOTS WITH MULTIPLE GROUPS 67
35 35
30 30
cyl cyl
25 25
4 4
mpg
mpg
6 6
20 20
8 8
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
68 CHAPTER 12. SCATTER PLOTS: CONTINUOUS X AND Y
35 35
30 30
cyl cyl
25 4 25 4
mpg
mpg
20 6 20 6
8 8
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
Read more on ggplot2 colors here : Chapter 26
Key arguments to customize the plot: alpha, color, fill, shape, linetype
and size
12.5. ADD REGRESSION LINE OR SMOOTHED CONDITIONAL MEAN 69
method : smoothing method to be used. Possible values are lm, glm, gam,
loess, rlm.
method = loess: This is the default value for small number of obser-
vations. It computes a smooth local regression. You can read more about
loess using the R code ?loess.
method =lm: It fits a linear model. Note that, its also possible to
indicate the formula as formula = y ~ poly(x, 3) to specify a degree 3
polynomial.
se : logical value. If TRUE, confidence interval is displayed around smooth.
fullrange : logical value. If TRUE, the fit spans the full range of the plot
level : level of confidence interval to use. Default value is 0.95
35
30 30
30
25
mpg
mpg
mpg
20
20 20
15
10
10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt wt
Change point color and shapes by groups:
35
35
30
30
cyl cyl
25 25
4 4
mpg
mpg
6 20 6
20
8 8
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
12.6. ADD QUANTILE LINES FROM A QUANTILE REGRESSION 71
Key arguments to customize the plot: alpha, color, linetype and size
50
40
hwy
30
20
10
10 15 20 25 30 35
cty
72 CHAPTER 12. SCATTER PLOTS: CONTINUOUS X AND Y
geom_rug(sides ="bl")
Sides : a string that controls which sides of the plot the rugs appear on. Allowed
value is a string containing any of trbl, for top, right, bottom, and left.
35 35
90
30 30
cyl
80
waiting
25 25 4
mpg
mpg
70
20 20 6
60
8
15 15 50
10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt eruptions
12.8. JITTER POINTS TO REDUCE OVERPLOTTING 73
Key arguments to customize the plot: alpha, color, fill, shape and size
40 40
30 30
hwy
hwy
20 20
2 3 4 5 6 7 2 3 4 5 6 7
displ displ
To adjust the extent of jittering, the function position_jitter() with the arguments
width and height are used:
Key arguments to customize the plot: label, alpha, angle, color, family,
fontface, hjust, lineheight, size, and vjust.
The argument label is used to specify a vector of labels for point annotations.
b + geom_text(aes(label = rownames(mtcars)),
size = 3)
35
Toyota Corolla
Fiat 128
Lotus
30Honda
Europa
Civic
Fiat X1-9
Porsche 914-2
25 Merc 240D
mpg
10 Cadillac
LincolnFleetwood
Continental
2 3 4 5
wt
Chapter 13
data(diamonds)
head(diamonds[, c("carat", "price")])
75
76 CHAPTER 13. CONTINUOUS BIVARIATE DISTRIBUTION
Key arguments to customize the plot: max, xmin, ymax, ymin, alpha,
color, fill, linetype and size.
# Default plot
c + geom_bin2d()
13.4. ADD HEXAGON BINING 77
20000 20000
count count
15000 15000
6000 7500
price
price
10000 10000
4000 5000
0 0
0 1 2 3 4 5 0 1 2 3 4 5
carat carat
count
15000
10000
price
10000
5000 5000
0
0 2 4 6
carat
Alternative functions:
c + stat_bin_2d()
c + stat_summary_2d(aes(z = depth))
to install it:
install.packages("hexbin")
Key arguments to customize the plot: alpha, color, fill and size.
require(hexbin)
# Default plot
c + geom_hex()
20000
value value
15000 5000 15000
7500
4000
price
price
10000 10000
3000 5000
5000 2000 5000
2500
1000
0 0
0 1 2 3 4 5 0 1 2 3 4 5
carat carat
Alternative functions:
c + stat_bin_hex()
c + stat_summary_hex(aes(z = depth))
13.5. SCATTER PLOTS WITH 2D DENSITY ESTIMATION 79
Key arguments to customize the plot: alpha, color, linetype and size.
faithful data set is used in this section, and we first start by creating a scatter plot
(sp) as follow:
data("faithful")
# Scatter plot
sp <- ggplot(faithful, aes(x = eruptions, y = waiting))
# Default plot
sp + geom_density_2d(color = "#E7B800")
# Add points
sp + geom_point(color = "#00AFBB") +
geom_density_2d(color = "#E7B800")
90 90
80 80
waiting
waiting
70 70
60 60
50 50
2 3 4 5 2 3 4 5
eruptions eruptions
80 CHAPTER 13. CONTINUOUS BIVARIATE DISTRIBUTION
level level
90 90
80 0.020 80 0.020
waiting
waiting
70 0.015 70 0.015
60 0.010 60 0.010
50 0.005 50 0.005
2 3 4 5 2 3 4 5
eruptions eruptions
Alternative function:
sp + stat_density_2d()
Key arguments to customize the plot: alpha, color, linetype, size and fill
(for geom_area only).
13.6. CONTINUOUS FUNCTION 81
data(economics)
# head(economics)
# Area plot
d + geom_area(fill = "#00AFBB", color = "white")
15000 15000
12500
unemploy
unemploy
unemploy
12000
10000 10000
8000 7500
5000
4000 5000
0 2500
19701980199020002010 19701980199020002010 1970 1980 1990 2000 2010
date date date
Chapter 14
Key arguments to customize the plot: alpha, color, fill, shape and size.
The diamonds data set [in ggplot2] well be used to plot the discrete variable color
(for diamond colors) by the discrete variable cut (for diamond cut types). The plot is
created using the function geom_jitter().
data("diamonds")
ggplot(diamonds, aes(cut, color)) +
geom_jitter(aes(color = cut), size = 0.5)
82
83
H cut
Fair
Good
color
G
Very Good
Premium
F Ideal
84
14.1. DATA FORMAT 85
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
dose dose
Chapter 15
Box Plots
Key arguments to customize the plot: alpha, color, linetype, shape, size
and fill.
outlier.colour, outlier.shape, outlier.size: The color, the shape and the size
for outlying points
notch: logical value. If TRUE, makes a notched box plot. The notch displays
a confidence interval around the median which is normally based on the median
+/- 1.58*IQR/sqrt(n). Notches are used to compare groups; if the notches of
two boxes do not overlap, this is a strong evidence that the medians differ.
87
88 CHAPTER 15. BOX PLOTS
30 2 30 30
dose
len
len
len
20 1 20 20
10 0.5 10 10
changing the order of items: for example from c("0.5", "1", "2") to c("2", "0.5",
"1")
e + geom_boxplot() +
scale_x_discrete(limits=c("2", "0.5", "1"))
30 30
len
len
20 20
10 10
0.5 2 2 0.5 1
dose dose
e + stat_boxplot(coeff = 1.5)
30 30 30
dose dose
0.5 0.5
len
len
len
20 20 20
1 1
2 2
10 10 10
len
len
20 20 20
1 1 1
10 2 10 2 10 2
len
len
20 20 20
1 1 1
10 2 10 2 10 2
30 30
supp supp
20 20
len
len
OJ OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
30
supp
20
len
OJ
VC
10
0.5 1 2
dose
Chapter 16
Violin plots
Violin plots are similar to box plots (Chapter 15), except that they also show the
kernel probability density of the data at different values. Typically, violin plots will
include a marker for the median of the data and a box indicating the interquartile
range, as in standard box plots.
The function geom_violin() is used to produce a violin plot.
Key arguments to customize the plot: alpha, color, linetype, size and fill.
93
94 CHAPTER 16. VIOLIN PLOTS
30 2 30
dose
len
len
20 1 20
10 0.5 10
0.5 1 2 10 20 30 0.5 1 2
dose len dose
Note that by default trim = TRUE. In this case, the tails of the violins are trimmed.
If FALSE, the tails are not trimmed.
To change the order of items (or to select some of the items), the function
scale_x_discrete() can be used as described in Chapter 15.
30 30 30
len
len
len
20 20 20
10 10 10
The function mean_sdl is used for adding mean and standard deviation. It computes
the mean plus or minus a constant times the standard deviation. In the R code
above, the constant is specified using the argument mult (mult = 1). By default
mult = 2. The mean +/- SD can be added as a crossbar or a pointrange.
30 dose 30 dose
0.5 0.5
len
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
As described in box plot chapter (Chapter 15), its possible to change manually violin
plot outline/fill colors, as follow:
96 CHAPTER 16. VIOLIN PLOTS
30 dose 30 dose
0.5 0.5
len
20 len 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
Read more on ggplot2 colors here: Chapter 26
30 30
supp supp
len
len
20 OJ 20 OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
Chapter 17
Dot Plots
Key arguments to customize the plot: alpha, color, dotsize and fill.
# Basic plot
e + geom_dotplot(binaxis = "y", stackdir = "center")
97
98 CHAPTER 17. DOT PLOTS
30 30
len
len
20 20
10 10
0.5 1 2 0.5 1 2
dose dose
To change the order of items (or to select some of the items), the function
scale_x_discrete() can be used as described in Chapter 15
30 30
len
len
20 20
10 10
0.5 1 2 0.5 1 2
dose dose
Combine with box plot and dot plot:
30 30 30
len
len
len
20 20 20
10 10 10
30 30 dose 30 dose
0.5 0.5
len
len
len
20 20 20
1 1
10 10 2 10 2
30 dose 30 dose
0.5 0.5
len
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
Read more on ggplot2 colors here: Chapter 26
# Change the position : interval between dot plot of the same group
e + geom_dotplot(aes(fill = supp), binaxis='y', stackdir='center',
position=position_dodge(0.8))
30 30
supp supp
len
len
20 OJ 20 OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
102 CHAPTER 17. DOT PLOTS
# Change colors
e + geom_dotplot(aes(fill = supp), binaxis='y', stackdir='center',
position=position_dodge(0.8)) +
scale_fill_manual(values=c("#999999", "#E69F00"))
30 30
supp supp
len
len
20 OJ 20 OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
30
supp
len
20 OJ
VC
10
0.5 1 2
dose
Chapter 18
Stripcharts
Stripcharts are also known as one dimensional scatter plots. These plots are suitable
compared to box plots when sample sizes are small.
The function geom_jitter() is used.
Key arguments to customize the plot: alpha, color, shape, size and fill.
# Basic plot
e + geom_jitter()
103
104 CHAPTER 18. STRIPCHARTS
30 30 30
len
len
len
20 20 20
10 10 10
30 30
len
len
20 20
10 10
0.5 1 2 0.5 1 2
dose dose
18.3. CHANGE POINT SHAPES BY GROUPS 105
30 30 30
len
len
len
20 20 20
10 10 10
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
Read more on point shapes : Chapter 27
30 30
dose
0.5
len
len
20 20
1
2
10 10
0.5 1 2 0.5 1 2
dose dose
Change manually point colors using the functions :
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
Read more on ggplot2 colors here: Chapter 26
108 CHAPTER 18. STRIPCHARTS
# Change the position : interval between dot plot of the same group
e + geom_jitter(aes(color = supp, shape = supp),
position=position_dodge(0.2))
30 30
supp supp
len
len
20 OJ 20 OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
Change point plot colors and add box plots :
# Change colors
e + geom_jitter(aes(color = supp, shape = supp),
position=position_jitter(0.2))+
scale_color_manual(values=c("#999999", "#E69F00"))
30 30
supp supp
20 20
len
len
OJ OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
30
supp
20
len
OJ
VC
10
0.5 1 2
dose
Chapter 19
Line Plots
Key arguments to customize the plot: alpha, color, linetype and size.
110
19.2. BASIC LINE PLOTS 111
head(df)
## dose len
## 1 D0.5 4.2
## 2 D1 10.0
## 3 D2 29.5
head(df2)
# Use geom_step()
p + geom_step() + geom_point()
30 30 30
20 20 20
len
len
len
10 10 10
30 30
supp supp
20 20
len
len
OJ OJ
VC VC
10 10
D0.5 D1 D2 D0.5 D1 D2
dose dose
30
supp
20
len
OJ
VC
10
D0.5 D1 D2
dose
30 30
supp supp
20 20
len
len
OJ OJ
VC VC
10 10
head(economics)
Plots :
320000 320000
315000
280000
pop
pop
310000
240000 305000
300000
200000
1970 1980 1990 2000 2010 2006 2008 2010 2012 2014
date date
Change line size :
116 CHAPTER 19. LINE PLOTS
320000
unemploy/pop
280000 0.02
pop
0.03
0.04
240000
0.05
200000
1970 1980 1990 2000 2010
date
Plot multiple time series data:
# Solution 1
ggplot(economics, aes(x=date)) +
geom_line(aes(y = psavert), color = "darkred") +
geom_line(aes(y = uempmed), color="steelblue", linetype="twodash") +
theme_minimal()
25
20
psavert
15
10
5
25
20 variable
value
15 psavert
10 uempmed
5
# Area plot
ggplot(economics, aes(x=date)) +
geom_area(aes(y=psavert), fill = "#999999",
color = "#999999", alpha=0.5) +
geom_area(aes(y=uempmed), fill = "#E69F00",
color = "#E69F00", alpha=0.5) +
theme_minimal()
25
20
psavert
15
10
5
0
Bar Plots
Key arguments to customize the plot: alpha, color, fill, linetype and size.
# head(df)
# head(df2)
118
20.2. BASIC BAR PLOTS 119
# Change fill color and add labels at the top (vjust = -0.3)
f + geom_bar(stat = "identity", fill = "steelblue")+
geom_text(aes(label = len), vjust = -0.3, size = 3.5)+
theme_minimal()
30 30 29.5 30
29.5
20 20 20
len
len
len
10
10 10
10 10
4.2
4.2
0 0
0
D0.5 D1 D2 D0.5 D1 D2 D0.5 D1 D2
dose dose dose
Its possible to change the width of bars using the argument width (e.g.: width =
0.5)
To change the order of items (or to select some of the items), the function
scale_x_discrete() can be used as described in Chapter 15
120 CHAPTER 20. BAR PLOTS
30 30
dose dose
20 20
D0.5 D0.5
len
len
D1 D1
10 10
D2 D2
0 0
D0.5 D1 D2 D0.5 D1 D2
dose dose
30 30
dose dose
20 20
D0.5 D0.5
len
len
D1 D1
10 10
D2 D2
0 0
D0.5 D1 D2 D0.5 D1 D2
dose dose
20.4. BAR PLOT WITH MULTIPLE GROUPS 121
# Use position=position_dodge()
g + geom_bar(stat="identity", position=position_dodge())
60
30
40
supp supp
20
len
len
OJ OJ
20 VC VC
10
0 0
D0.5 D1 D2 D0.5 D1 D2
dose dose
Add labels to a dodged bar plot :
30 33
29.5
supp
20
len
OJ
15 VC
10
10
6.8
0 4.2
D0.5 D1 D2
dose
Add labels to a stacked bar plot: 3 steps are required
122 CHAPTER 20. BAR PLOTS
1. Sort the data by dose and supp : the package plyr is used
2. Calculate the cumulative sum of the variable len for each dose
3. Create the plot
require(plyr)
# Sort by dose and supp
df_sorted <- arrange(df2, dose, supp)
head(df_sorted)
60 33
40 supp
len
OJ
29.5 VC
20 15
6.8 10
0 4.2
D0.5 D1 D2
dose
If you want to place the labels at the middle of bars, you have to modify the cumulative
sum as follow :
60
40 33 supp
len
OJ
VC
20
15
29.5
6.8
0 10
4.2
D0.5 D1 D2
dose
Chapter 21
Visualizing Error
124
21.3. PLOT TYPES 125
library("dplyr")
df2 <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len),
len = mean(len)
)
head(df2)
Key arguments to customize the plot: alpha, color, fill, linetype and size.
Well use the data set named df2, which holds the mean and the SD of tooth length
(len) by groups (dose).
# Default plot
f + geom_crossbar()
# color by groups
f + geom_crossbar(aes(color = dose))
30 30
25 25 dose
20 20 0.5
len
len
1
15 15
2
10 10
5 5
0.5 1 2 0.5 1 2
dose dose
30 30
25 dose 25 dose
20 0.5 20 0.5
len
len
15 1 15 1
10 2 10 2
5 5
0.5 1 2 0.5 1 2
dose dose
Cross bar with multiple groups: we start by creating a data set named df3 which
holds the mean and the SD of tooth length (len) by 2 groups (supp and dose).
library("dplyr")
df3 <- df %>%
group_by(supp, dose) %>%
summarise(
sd = sd(len),
len = mean(len)
)
head(df3)
The data set df3 is used to create cross bars with multiple groups. For this end,
the variable len is plotted by dose and the color is changed by the levels of the factor
supp.
30 30
supp supp
20 20
len
len
OJ OJ
VC VC
10 10
0.5 1 2 0.5 1 2
dose dose
Key arguments to customize the plot: alpha, color, linetype, size and
width.
Well use the data set named df2, which holds the mean and the SD of tooth length
(len) by groups (dose).
We start by creating a plot, named f, that well finish next by adding a layer.
30 30 30
25 dose 25 dose
20
20 0.5 20 0.5
len
len
len
1 1
15 15 10
2 2
10 10
0
5 5
0.5 1 2 0.5 1 2 0.5 1 2
dose dose dose
30
dose
20
0.5
len
1
10
2
0
0.5 1 2
dose
In the R code above, the argument width specifies the width of error bars.
30 30
20 supp supp
20
len
len
OJ OJ
VC VC
10
10
0
0.5 1 2 0.5 1 2
dose dose
Key arguments to customize the plot: alpha, color, linetype, size and
height.
Well use the data set named df2, which holds the mean and the SD of tooth length
(len) by groups (dose).
We start by creating a plot, named f, that well finish next by adding a layer.
The arguments xmin and xmax are used for horizontal error bars.
2 2 dose
0.5
dose
dose
1 1
1
2
0.5 0.5
5 10 15 20 25 30 5 10 15 20 25 30
len len
Key arguments to customize the plot: alpha, color, linetype, size, shape
and fill (for geom_pointrange()).
# Point range
f + geom_pointrange()
21.8. COMBINE DOT PLOT AND ERROR BARS 133
30 30
25 25
20 20
len
len
15 15
10 10
5 5
0.5 1 2 0.5 1 2
dose dose
Key arguments to customize the plot: alpha, color, fill, linetype and
size.
To combine geom_dotplot() and error bars, well use the ToothGrowth data
set. You dont need to compute the mean and SD. This can be done automatically by
using the function stat_summary() in combination with the argument fun.data
= mean_sdl.
We start by creating a dot plot, named g, that well finish in the next section by
adding error bar layers.
# use geom_crossbar()
g + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
geom="crossbar", width=0.5)
# Use geom_errorbar()
134 CHAPTER 21. VISUALIZING ERROR
# Use geom_pointrange()
g + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
30 30 30
len
len
len
20 20 20
10 10 10
Pie Charts
The function coord_polar() is used to produce a pie chart, which is just a stacked
bar chart in polar coordinates.
df <- data.frame(
group = c("Male", "Female", "Child"),
value = c(25, 25, 50))
head(df)
## group value
## 1 Male 25
## 2 Female 25
## 3 Child 50
# default plot
p <- ggplot(df, aes(x="", y = value, fill=group)) +
geom_bar(width = 1, stat = "identity") +
135
136 CHAPTER 22. PIE CHARTS
coord_polar("y", start=0)
p
0/100 0/100
group group
Child Child
75 25 75 25
x
x
Female Female
Male Male
50 50
value value
25%
Blues
Child
Female
In the R code above, we used the function scale_fill_brewer() to change fill colors.
You can read more about colors in Chapter 26
Part V
Graphical Parameters
138
Chapter 23
Graphical Primitives
This section describes how to add graphical elements (polygon, path, ribbon,
segment and rectangle) to a plot.
Key arguments to customize the plot: alpha, color, fill (for ribbon only),
linetype and size
139
140 CHAPTER 23. GRAPHICAL PRIMITIVES
require(maps)
france = map_data('world', region = 'France')
ggplot(france, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = 'white', colour = 'black')
50
48
lat
46
44
42
-5 0 5 10
long
2. Use econimics data [in ggplot2] and produces path, ribbon and rectangles.
12000
8000
4000
unemploy 16000
12000
8000
4000
3. Add line segments and curves between points (x1, y1) and (x2, y2):
# Add segment
i + geom_segment(aes(x = 2, y = 15, xend = 3, yend = 15))
# Add arrow
require(grid)
i + geom_segment(aes(x = 5, y = 30, xend = 3.5, yend = 25),
arrow = arrow(length = unit(0.5, "cm")))
# Add curves
i + geom_curve(aes(x = 2, y = 15, xend = 3, yend = 15))
35 35 35
30 30 30
25 25 25
mpg
mpg
mpg
20 20 20
15 15 15
10 10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt wt
Chapter 24
The function below can be used for changing titles and labels:
The function labs() can be also used to change the legend title.
142
24.1. CHANGE THE MAIN TITLE AND AXIS LABELS 143
# Default plot
print(p)
Plot of length
30 by dose
dose
dose
Teeth length
0.5 30
len
20
1 0.5
20
2 1
10
10 2
0.5 1 2 0.5 1 2
dose Dose (mg)
Note that, you can use \n, to split long title into multiple lines.
Plot titles can be also changed using the functions ggtitle(), xlab(), ylab() as follow.
Plot of length
by dose 30
dose
Teeth length
dose
30 0.5
0.5 20
20 1
1
10 2
2 10
0.5 1 2
Dose (mg) 0.5 1 2
Plot of length
by dose Dose (mg)
Teeth length
30
0.5
20
1
10
2
0.5 1 2
Dose (mg)
Chapter 25
145
146 CHAPTER 25. LEGEND POSITION AND APPEARANCE
# Remove legends
p + theme(legend.position = "none")
dose 0.5 1 2 30 30
30
len
len
20 20
dose
len
20
0.5
10 10
10 1
2
0.5 1 2 0.5 1 2 0.5 1 2
dose dose dose
As shown above, the argument legend.position can be also a numeric vector c(x,y),
where x and y are the coordinates of the legend box. Their values should be between
0 and 1. c(0,0) corresponds to the "bottom left" and c(1,1) corresponds to the "top
right" position.
30 dose 30 dose
0.5 0.5
len
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
30 dose 30 Dose
0.5 A
len
len
20 20
1 B
10 2 10 C
2 0.5 1 0.5 1 2
dose dose
The color and the shape of the points are determined by the factor variables cyl and
gear, respectively. The size of the points are controlled by the variable qsec.
The function guide_legend() is used to change the order of guides.
148 CHAPTER 25. LEGEND POSITION AND APPEARANCE
8 8
5 5
5
gear qsec
4 3 4 15.0 gear
4
4 17.5 3
wt
wt
wt
5 20.0 4
3 3
3
22.5 5
qsec
2 2
2
15.0 gear
10 15 20 25 30 35 17.5 10 15 20 25 30 35 3
10 15 20 25 30 35
mpg 20.0 mpg 4 mpg
Note that, in the case of continuous color, the function guide_colourbar() should
be used to change the order of color guide.
Removing a particular legend can be done also when using the functions scale_xx.
In this case the argument guide is used as follow.
Colors
This chapter describes how to change the color of a graph. A color can be specified
either by name (e.g.: red) or by hexadecimal code (e.g. : #FF1234).
You will learn how to :
ToothGrowth and mtcars data sets are used in the examples below.
# Box plot
bp <- ggplot(ToothGrowth, aes(x=dose, y=len))
# Scatter plot
sp <- ggplot(mtcars, aes(x=wt, y=mpg))
150
26.1. USE A SINGLE COLOR 151
# box plot
bp + geom_boxplot(fill = 'steelblue', color = "red")
# scatter plot
sp + geom_point(color = 'darkblue')
35
30 30
25
mpg
len
20
20
10 15
10
0.5 1 2 2 3 4 5
dose wt
# Box plot
bp <- bp + geom_boxplot(aes(fill = dose))
bp
# Scatter plot
sp <- sp + geom_point(aes(color = cyl))
sp
152 CHAPTER 26. COLORS
35
30 dose 30 cyl
0.5 25 4
mpg
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
The lightness (l) and the chroma (c, intensity of color) of the default (hue) colors
can be modified using the functions scale_hue as follow.
# Box plot
bp + scale_fill_hue(l=40, c=35)
# Scatter plot
sp + scale_color_hue(l=40, c=35)
35
30 dose 30 cyl
0.5 25 4
mpg
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
Note that, the default values for l and c are : l = 65, c = 100.
# Box plot
bp + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Scatter plot
sp + scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
35
30 dose 30 cyl
0.5 25 4
mpg
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
# Box plot
bp + scale_fill_brewer(palette="Dark2")
# Scatter plot
sp + scale_color_brewer(palette="Dark2")
35
30 dose 30 cyl
0.5 25 4
mpg
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
The available color palettes in the RColorBrewer package are :
154 CHAPTER 26. COLORS
# Install
install.packages("wesanderson")
# Load
library(wesanderson)
library(wesanderson)
# Box plot
bp+scale_fill_manual(values=wes_palette(n=3, name="GrandBudapest"))
# Scatter plot
sp+scale_color_manual(values=wes_palette(n=3, name="GrandBudapest"))
35
30 dose 30 cyl
0.5 mpg 25 4
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
# Box plot
bp + scale_fill_grey(start=0.8, end=0.2) + theme_minimal()
# Scatter plot
sp + scale_color_grey(start=0.8, end=0.2) + theme_minimal()
35
30 dose 30 cyl
0.5 25 4
mpg
len
20
1 20 6
10 2 15 8
10
0.5 1 2 2 3 4 5
dose wt
156 CHAPTER 26. COLORS
mpg
mpg
20 20 20
17.5 17.5 17.5
15 15 15
# Scatter plot
# Color points by the mpg variable
sp3 <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(aes(color = mpg))
sp3
35 mpg 35 mpg
30 30 30 30
25 25
mpg
mpg
25 25
20 20
20 20
15 15
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
Chapter 27
The different points shapes commonly used in R are illustrated in the figure below :
158
159
Create a scatter plot and change point shapes, colors and size:
35 35 35
30 30 30
cyl
25 25 25 4
mpg
mpg
mpg
20 20 20 6
8
15 15 15
10 10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt wt
Note that, the argument fill can be used only for the point shapes 21 to 25.
cyl 4 6 8 cyl 4 6 8
35 35
30 30
25 25
mpg
mpg
20 20
15 15
10 10
2 3 4 5 2 3 4 5
wt wt
Chapter 28
Line types
The different line types available in R software are : blank, solid, dashed,
dotted, dotdash, longdash, twodash.
twodash
solid
longdash
Line types
dotted
dotdash
dashed
blank
161
162 CHAPTER 28. LINE TYPES
## time bill
## 1 breakfeast 10
## 2 Lunch 30
## 3 Dinner 15
30
25
bill
20
15
10
breakfeastDinner Lunch
time
40
30 sex
bill
Female
20 Male
10
breakfeastDinner Lunch
time
The functions below can be used to change the appearance of line types manually:
40
30 sex
bill
Female
20 Male
10
breakfeastDinner Lunch
time
Chapter 29
2. With clipping the data: (removes unseen data points): Observations not in
this range will be dropped completely and not passed to any other layers.
3. Expand the plot limits with data: This function is a thin wrapper around
geom_blank() that makes it easy to add data to a plot.
164
165
data(cars)
p <- ggplot(cars, aes(x = speed, y = dist)) + geom_point()
# Default plot
print(p)
125 50
100 40
75 30
dist
dist
50 20
25 10
0 0
5 10 15 20 25 5 10 15 20
speed speed
Note that, the function expand_limits() can be also used to quickly set the
intercept of x and y axes at (0,0).
data(cars)
p <- ggplot(cars, aes(x = speed, y = dist)) + geom_point()
p + scale_x_continuous(trans=log2), p +
scale_y_continuous(trans=log2) : another allowed value for the
argument trans is log10
166
167
# Reverse coordinates
p + scale_y_reverse()
125 27 0
100 64 26 25
75 25 50
dist
dist
dist
50
16 24 dist 75
23
25 4 22 100
0 21 125
5 10 15 20 25 4 8 16 5 10 15 20 25 5 10 15 20 25
speed speed speed speed
4. Format axis tick mark labels: The scales package is required to access
break formatting functions.
require(scales)
# Percent
p + scale_y_continuous(labels = percent)
# Dollar
p + scale_y_continuous(labels = dollar)
# Scientific
p + scale_y_continuous(labels = scientific)
168 CHAPTER 30. AXIS TRANSFORMATIONS: LOG AND SQRT
dist
dist
5,000% $50 5.00e+01
2,500% $25 2.50e+01
0% $0 0.00e+00
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25
speed speed speed
Note that, these tick marks make sense only for base 10.
# Required package
require(MASS) # to access Animals data sets
require(scales) # to access break formatting functions
data(Animals) # Load data
103 103
brain
brain
102 102
101 101
100 100
# All sides
p2+annotation_logticks(sides="trbl")
Allowed values for the argument sides are the combination of t(top), r(right),
b(bottom), l(left).
Chapter 31
Date Axes
The functions scale_x_date() and scale_y_date() are used to format date axes.
Create some time series data sets:
set.seed(1234)
df <- data.frame(
date = seq(Sys.Date(), len=100, by="1 day")[sample(100, 50)],
price = runif(50)
)
df <- df[order(df$date), ]
head(df)
## date price
## 7 2016-04-20 0.49396092
## 24 2016-04-23 0.78312110
## 1 2016-05-01 0.07377988
## 23 2016-05-02 0.01462726
## 19 2016-05-05 0.05164662
## 25 2016-05-06 0.08996133
170
31.1. FORMAT AXIS TICK MARK LABELS: DAYS, WEEKS, MONTHS 171
# Default plot
p <- ggplot(data=df, aes(x = date, y=price)) + geom_line()
p
# Format : Week
p + scale_x_date(labels = date_format("%W"))
# Months only
p + scale_x_date(breaks = date_breaks("months"),
labels = date_format("%b")) +
theme(axis.text.x = element_text(angle=45))
0.50
price
price
0.50 0.50
0.25
0.25 0.25
0.00
05 1
5
06 1
07 5
07 1
5
1
/0
/1
/0
/1
/0
/1
/0
0.00 0.00
05
06
08
maimai
01 15
jui 01
jui 15
jul 01
jul 15
ao 01 17 19 22 24 26 28 31
date date date
0.75
price
0.50
0.25
0.00
ai
l
ju
ju
ao
m
date
172 CHAPTER 31. DATE AXES
15 15
psavert
psavert
10 10
5 5
1970 1980 1990 2000 2010 2002 2004 2006 2008 2010 2012 2014
date date
Chapter 32
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) + geom_boxplot()
# print(p)
3. Change the style and the orientation angle of axis tick labels
173
174 CHAPTER 32. AXIS TICKS : CUSTOMIZE TICK MARKS AND LABELS
len
10
5
1
2
0.
dose dose
30
len
20
10
0.5 1 2
dose
175
x or y axis can be discrete or continuous. In each of these two cases, the functions to
be used for setting axis ticks are different.
Discrete axes
scale_x_discrete(name, breaks, labels, limits): for X axis
scale_y_discrete(name, breaks, labels, limits): for y axis
Continuous axes
scale_x_continuous(name, breaks, labels, limits, trans): for X axis
scale_y_continuous(name, breaks, labels, limits, trans): for y axis
axis titles
axis limits (data range to display)
choose where tick marks appear
manually label tick marks
Note that, in the examples below, well use only the functions scale_x_discrete()
and xlim() to customize x axis tick marks. The same kind of examples can be
applied to a discrete y axis using the functions scale_y_discrete() and ylim().
176 CHAPTER 32. AXIS TICKS : CUSTOMIZE TICK MARKS AND LABELS
# Default plot
print(p)
# or use this:
# p + xlim("0.5", "2") # same as above
30 30 30 30
len
20
len
len
len
20 20 20
10 10 10 10
125
Stopping distance
150
100 100
75 100 75
dist
dist
dist
65
50 50 50
50
25
0 0 0 0
5 10 15 20 25 0 10 20 30 5 10 15 20 25 5 10 15 20 25
speed Speed of cars speed speed
12,500%
10,000%
7,500%
dist
dist
5,000%
2,500%
0%
5 10 15 20 25 5 10 15 20 25
speed speed
Possible values for labels are comma, percent, dollar and scientific. For more examples,
read the documentation of the function trans_new() in scales package
Chapter 33
This chapter describes how to change the look of a plot theme (background color,
panel background color and grid lines). Youll also learn how to use ggplot2 base
themes to create your own theme.
ToothGrowth data will be used :
data("ToothGrowth")
# Convert the column dose from numeric to factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
178
33.1. CHANGE PLOT THEMES: QUICK FUNCTIONS 179
theme_light(): light gray lines and axis (more attention towards the data)
p + theme_gray(base_size = 14)
p + theme_bw()
p + theme_linedraw()
p + theme_light()
30 30 30 30
len
20
len
len
len
20 20 20
10 10 10 10
theme_classic(): theme with axis lines and no grid lines (standard plot)
theme_void(): Empty theme, useful for plots with non-standard coordinates or for
drawings
p + theme_minimal()
p + theme_minimal()
p + theme_void()
p + theme_dark()
180 CHAPTER 33. THEMES AND BACKGROUND COLORS
theme_void
theme_minimal theme_classic theme_dark
30
30 30 30
len
len
20 20 len20
len
20
10 10
10 10
base_size : base font size (to change the size of all plot text elements)
The size of all the plot text elements can be easily changed at once :
# Example 1
p + theme_gray(base_size = 10)
# Example 2
p + theme_gray(base_size = 20)
30
30
len
20
len
20
10
10
0.5 1 2
0.5 1 2
dose dose
Note that, the function theme_set() can be used to change the theme for the entire
session.
33.2. CUSTOMIZE PLOT BACKGROUND 181
theme_set(theme_gray(base_size = 20))
Line elements: axis lines, minor and major grid lines, plot panel border, axis
ticks background color, etc.
Text elements: plot title, axis titles, legend title and text, axis tick mark
labels, etc.
Note that, each of the theme elements can be removed using the function ele-
ment_blank()
The appearance of grid lines can be changed using the function ele-
ment_line(colour, size, linetype).
182 CHAPTER 33. THEMES AND BACKGROUND COLORS
30
30
len
20
len
20
10
10
0.5 1 2
0.5 1 2
dose
dose
30
len
20
10
0.5 1 2
dose
install.packages("ggthemes") # Install
library(ggthemes) # Load
ggthemes package provides many custom themes and scales. Two of themes are
shown below:
# theme_stata
p + theme_stata() + scale_fill_stata()
184 CHAPTER 33. THEMES AND BACKGROUND COLORS
30
dose 0.5 1 2
20
len
10
30
len
20
10 0.5 1 2
dose
0.5 1 2
dose dose 0.5 1 2
theme_set(theme_gray(base_size = 20))
theme_gray
Note that, the function rel() modifies the size relative to the base size.
Chapter 34
Text Annotations
This article describes how to add a text annotation to a plot generated using ggplot2
package.
The functions below can be used :
Its also possible to use the R package ggrepel, which is an extension and provides
geom for ggplot2 to repel overlapping text labels away from each other.
Well start by describing how to use ggplot2 official functions for adding text annota-
tions. In the last sections, examples using ggrepel extensions are provided.
set.seed(1234)
df <- mtcars[sample(1:nrow(mtcars), 10), ]
df$cyl <- as.factor(df$cyl)
185
186 CHAPTER 34. TEXT ANNOTATIONS
Toyota
35 FiatCorolla 35
128
Honda Civic
30 30 Scatter plot
cyl
25 a 4 25
mpg
mpg
HornetRX4
Mazda 4 Drive
Ferrari DinoFirebird
Pontiac a 6
20 Valiant 20
Merc 450SE
a 8
15 15
Lincoln Continental
10 10
2 3 4 5 2 3 4 5
wt wt
sp + geom_label()
34.2. ANNOTATION_CUSTOM : ADD A STATIC TEXT ANNOTATION 187
nudge_x and nudge_y: let you offset labels from their corresponding points.
The function position_nudge() can be also used.
hjust and vjust can now be character vectors (ggplot2 v >= 2.0.0): "left",
"center", "right", "bottom", "middle", "top". New options include "inward"
and "outward" which align text towards and away from the center of the plot
respectively.
4 6 8
Scatter plot Scatter plot Scatter plot
21
17.5
30
20
mpg
15.0
25 19
12.5
18
10.0
1.5 2.0 2.5 3.0 2.75 3.00 3.25 3.50 3.5 4.0 4.5 5.0 5.5
wt
Case of facet (Chapter 37): the annotation is at the same place (in each facet) even
if the axis scales vary.
install.packages("ggrepel")
geom_label_repel()
geom_text_repel()
We start by creating a simple scatter plot using a subset of the mtcars data set
containing 15 rows.
35
Toyota Corolla
Fiat 128
30 Lotus
Honda
Europa
Civic
25
Porsche 914-2
mpg
Hornet 4 Drive
Mazda RX4
20 Ferrari Dino Pontiac Firebird
Valiant
Merc 450SE
15 AMC Javelin
Maserati Bora
Camaro Z28
10 Lincoln Continental
2 3 4 5
wt
# Use ggrepel::geom_text_repel
require("ggrepel")
set.seed(42)
p + geom_text_repel(aes(label = rownames(df)),
size = 3.5)
190 CHAPTER 34. TEXT ANNOTATIONS
35
Fiat 128
Toyota Corolla
30 Honda Civic
Lotus EuropaPorsche 914-2
25
Mazda RX4 Hornet 4 Drive
mpg
20 Pontiac Firebird
Valiant
Ferrari Dino Merc 450SE
AMC Javelin
15
Maserati Bora
Lincoln Continental
10 Camaro Z28
2 3 4 5
wt
35
Toyota Corolla Fiat 128
30
Honda Civic
Lotus Europa
25 Hornet 4 Drive
mpg
2 3 4 5
wt
factor(cyl) a 4 a 6 a 8
Chapter 35
191
192 CHAPTER 35. ADD STRAIGHT LINES TO A PLOT
35 35
y = -5X + 37
35
30 30
30
25 25
mpg
mpg
25
mpg
20 20
20
15 15 15
10 10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt wt
35 35 35
30 30 30
25 25 25
mpg
mpg
mpg
20 20 20
15 15 15
10 10 10
2 3 4 5 2 3 4 5 2 3 4 5
wt wt wt
Chapter 36
set.seed(1234)
# Basic histogram
hp <- qplot(x=rnorm(200), geom="histogram")
hp
# Horizontal histogram
hp + coord_flip()
# Y axis reversed
hp + scale_y_reverse()
3 0
20 2
rnorm(200)
5
15 1
count
count
0 10
10
-1
5 15
-2
0 -3
-2 0 2 0 5 10 15 20 -3 -2 -1 0 1 2
rnorm(200) count rnorm(200)
193
Chapter 37
Facets divide a plot into subplots based on the values of one or more categorical
variables. There are two main functions for faceting :
facet_grid()
facet_wrap()
194
195
30
dose
0.5
len
20
1
2
10
0.5 1 2
dose
The following functions can be used for facets:
1. Facet with one discrete variable: Split by the levels of the group supp
30 OJ VC
OJ
20 dose
30 dose
10 0.5
len
0.5
len
1 20
30 1
2
VC
20
10 2
10
2. Facet with two discrete variables: Split by the levels of the groups dose
and supp
OJ VC 0.5 1 2
30 30
0.5
20
OJ
10 20
dose dose
30 0.5 10 0.5
len
len
20
1
10 1 1
30
2 VC 2
30 20
20
2
10
10
0.5 1 2 0.5 1 2 0.51 2 0.51 2 0.51 2
dose dose
Note that, you can use the argument margins to add additional facets which contain
all the data for each of the possible values of the faceting variables
3. Facet scales
By default, all the panels have the same scales (scales=fixed). They can be made
independent, by setting scales to free, free_x, or free_y.
4. Facet labels: The argument labeller can be used to control the labels of the
panels.
supp: OJ supp: VC
dose: 0.5
30
20
10
dose
30 0.5
dose: 1
len
20
1
10
2
30
dose: 2
20
10
0.5 1 2 0.5 1 2
dose
The appearance of facet labels can be modified as follow :
# Change facet text font. Possible values for the font style:
#'plain', 'italic', 'bold', 'bold.italic'.
p + facet_grid(dose ~ supp)+
theme(strip.text.x = element_text(size=12, color="red",
face="bold.italic"),
strip.text.y = element_text(size=12, color="red",
face="bold.italic"))
OJ VC OJ VC
30 30
0.5
0.5
20 20
10 10
dose dose
30 0.5 30 0.5
len
len
20 20
1
1
1 1
10 10
2 2
30 30
20 20
2
2
10 10
bp + facet_wrap(~ dose)
0.5 1 2 0.5 1
30
30 20
dose dose
10
0.5 0.5
len
len
20
1 2 1
30
2 2
10 20
10
0.5 1 2 0.5 1 2 0.5 1 2 0.5 1 2
dose dose
Chapter 38
Position Adjustements
1.00
60 drv drv
0.75
4 4
count
count
40 0.50
f f
20 r 0.25 r
0 0.00
c d e p r c d e p r
fl fl
199
200 CHAPTER 38. POSITION ADJUSTEMENTS
150 40
drv
4
count
hwy
100 30
f
50 20
r
0
c d e p r 10 20 30
fl cty
Note that, each of these position adjustments can be done using a function with
manual width and height argument.
position_dodge(width, height)
position_fill(width, height)
position_stack(width, height)
position_jitter(width, height)
60 drv
4
count
40
f
20 r
0
c d e p r
fl
Coordinate Systems
201
202 CHAPTER 39. COORDINATE SYSTEMS
200
r
150 p
count
100 150 e
fl
count
100
50 d
50
c
0 0
c d e p r c d e p r 0 50 100 150
fl fl count
160
120
160 r c 80
count
120
80 40
count
40
0
p d
0
e c d e p r
fl fl
Part VI
Extensions to ggplot2
203
Chapter 40
To arrange multiple ggplot2 graphs on the same page, the standard R functions - par()
and layout() - cannot be used.
This chapter will show you, step by step, how to put several ggplots on a single page.
Key functions:
1. Installation:
install.packages("gridExtra")
install.packages("cowplot")
204
40.2. DATA 205
2. Loading:
library("gridExtra")
library("cowplot")
40.2 Data
ToothGrowth and economics data sets are used :
# Load ToothGrowth
# Convert the variable dose from a numeric to a factor variable
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Box plot and dot plot using the ToothGrowth data set
Line plot using the economics data set
Well use custom colors to manually change line and fill colors (functions:
scale_color_manual() and scale_fill_manual(); see Chapter 26)
206 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
# A set of 3 colors
my3cols <- c("#E7B800", "#2E9FDF", "#FC4E07")
require(cowplot)
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
15
psavert
10
Note that the default design of cowplot has white background with no grid at all.
It looks similar to ggplot2s theme_classic(), but there are some subtle differences
with respect to font sizes. In many cases, this is the cleanest and most elegant way
to display the data.
If you want to add gridlines or to use the default ggplot2 theme, follow the R code
below:
# Add gridlines
bxp + background_grid(major = "xy", minor = "none")
# Use theme_gray()
bxp + theme_gray()
30 dose 30 dose
0.5 0.5
len
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
In this section well see how to combine the different plots created in the previous
section.
Key functions:
In the R code below, the function plot_grid() is used to combine a box plot (bxp),
a dot plot (dp) and a line plot (lp) on a grid of 2 columns and 2 rows:
A B
30 dose 30 dose
0.5 0.5
len
len
20 20
1 1
10 2 10 2
0.5 1 2 0.5 1 2
dose dose
C
15
psavert
10
draw_plot_label(): Adds a plot label to the upper left corner of a graph. It can
handle vectors of labels with associated coordinates.
Note that, by default, coordinates run from 0 to 1, and the point (0, 0) is in the
lower left corner of the canvas (see the figure below).
210 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
ggdraw() +
draw_plot(bxp, x = 0, y = .5, width = .5, height = .5) +
draw_plot(dp, x = .5, y = .5, width = .5, height = .5) +
draw_plot(lp, x = 0, y = 0, width = 1, height = 0.5) +
draw_plot_label(label = c("A", "B", "C"),
x = c(0, 0.5, 0), y = c(1, 1, 0.5), size = 15)
A B
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
C
15
psavert
10
The cowplot default theme works nicely in conjunction with the save_plot(). The
output pdfs are nicely formatted and scaled.
40.4. GRIDEXTRA PACKAGE 211
For example, if we want to save the box plot generated in the previous section, we
might use this code:
For example, if we want to save a 2-by-2 figure, we might use this code:
save_plot("plot2by2.pdf", plot2by2,
ncol = 2, # we're saving a grid plot of 2 columns
nrow = 2, # and 2 rows
# each individual subplot should have an aspect ratio of 1.3
base_aspect_ratio = 1.3
)
the box plot, the dot plot and the line plot created in the previous sections
a bar plot created using the diamonds data sets as follow
212 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
require(gridExtra)
grid.arrange(bxp, dp, lp, brp, ncol = 2, nrow =2)
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
15 cut
10000 Fair
psavert
Good
count
10
Very Good
5000
Premium
5
Ideal
0
1970 1980 1990 2000 2010 I1SI2SI1VS2
VS1
VVS2
VVS1
IF
date clarity
40.4. GRIDEXTRA PACKAGE 213
30
dose
30
0.5
20
len
1
2
10
dose
20 0.5 0.5 1 2
len
1 dose
2
cut
10000
Fair
Good
count
10
Very Good
5000
Premium
Ideal
0
0.5 1 2 I1SI2SI1VS2
VS1
VVS2
VVS1
IF
dose clarity
214 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
In the R code below layout_matrix is a 2x2 matrix (2 columns and 2 rows). The
first row is all 1s, thats where the first plot lives, spanning the three columns; the
second row contains plots 2, 3, 4, each occupying one row.
cut
10000 Fair
Good
count
Very Good
5000
Premium
Ideal
0
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
clarity
30 30
dose dose
0.5 0.5
len
len
20 20
1 1
2 2
10 10
0.5 1 2 0.5 1 2
dose dose
To save the legend of a ggplot, the helper function below can be used :
Well arrange the box plot (bxp) and the dot plot (dp) created in the previous sections.
# 3. Remove the legend from the box plot and the dot plot
bxp2 <- bxp + theme(legend.position="none")
dp2 <- dp + theme(legend.position="none")
30 30
dose
len
len
20 20
0.5
1
10 10
2
0.5 1 2 0.5 1 2
dose dose
216 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
In the R code above, the argument widths is a vector containing 3 values specifying
the width of the box plot (bxp2), the dot plot (dp2) and the legend, respectively.
Its also possible to use the argument layout_matrix to customize legend position.
We start by creating a dot plot with a top legend position. Next, we save the legend
and remove it from the dot plot:
The first row (height = 2.5) is where the first plot (bxp2) and the second plot
(dp) live
The second row (height = 0.2) is where the legend lives spanning 2 columns
Bottom-center legend:
30 30
len
len
20 20
10 10
0.5 1 2 0.5 1 2
dose dose
dose 0.5 1 2
40.4. GRIDEXTRA PACKAGE 217
Top-center legend:
The legend (plot 1) lives in the first row (height = 0.2) spanning two columns
bxp2 (plot 2) and dp2 (plot 3) live in the second row (height = 2.5)
dose 0.5 1 2
30 30
len
len
20 20
10 10
0.5 1 2 0.5 1 2
dose dose
set.seed(1234)
x <- c(rnorm(350, mean = -1), rnorm(350, mean = 1.5),
rnorm(350, mean = 4))
## x y group
## 1 -2.20706575 -0.715413865 1
## 2 -0.72257076 -0.009793331 1
## 3 0.08444118 -0.440606576 1
## 4 -3.34569770 -0.441687947 1
## 5 -0.57087531 1.338363107 1
## 6 -0.49394411 -0.112349101 1
require("gridExtra")
grid.arrange(xdensity, blankPlot, scatterPlot, ydensity,
ncol=2, nrow=2, widths=c(4, 1.4), heights=c(1.4, 4))
0.4
density
0.3
0.2
0.1
0.0
-2.5 0.0 2.5 5.0
x
6 6
group
4 1 4
2
3
2 2
y
0 0
-2 -2
require(grid)
# Move to a new page
grid.newpage()
4
group
1
2
2
y
0 3
-2
2
y
0.2
0
0.1
-2
0.0
-2.5 0.0 2.5 5.0 0.0 0.1 0.2 0.3 0.4
x density
# Install
install.packages("ggExtra")
library("ggExtra")
6 6
group group
4 1 4 1
2 2
2 2
3 3
y
y
0 0
-2 -2
2. Add, for example, the box plot of the variables x and y inside the scatter plot
using the function annotation_custom()
As the inset box plot overlaps with some points, a transparent background is used
for the box plots.
6
group
1
4
2
3
2
y
-2
6
group
1
4
2
3
2
y
-2
tableGrob() [in the package gridExtra] : for adding a data table to a graphic
device
splitTextGrob() [in the package RGraphics] : for adding a text to a graph
library(RGraphics)
library(gridExtra)
# Table
p1 <- tableGrob(head(ToothGrowth, 3))
# Text
text <- paste0("ToothGrowth data describes the effect ",
"of Vitamin C on tooth growth in Guinea pigs.")
p2 <- splitTextGrob(text)
226 CHAPTER 40. ARRANGE MULTIPLE GRAPHS ON THE SAME PAGE
# Box plot
p3 <- ggplot(ToothGrowth, aes(x = dose, y = len)) +
geom_boxplot(aes(color = dose)) +
scale_color_manual(values = my3cols)
30
dose
0.5
len
20
1
2
10
0.5 1 2
dose
Chapter 41
The R packages GGally and ggcorrplot are two extensions to ggplot2 for displaying
a correlation matrix.
Compared to GGally, the ggcorrplot package provides many options for visualizing
a correlation matrix. For example, it provides a solution for reordering the correlation
matrix and displays the significance level on the correlogram. It includes also a
function for computing a matrix of correlation p-values.
41.1 GGally
Compute and visualize a correlation matrix:
# Correlation matrix
library("GGally")
mydata <- mtcars[, c(1,3,4,5,6,7)]
ggcorr(mydata, palette = "RdBu", label = TRUE)
227
228 CHAPTER 41. CORRELATION MATRIX VISUALIZATION
qsec
rho
wt -0.2
(-1,-0.75]
(-0.75,-0.5]
drat -0.7 0.1 (-0.5,-0.25]
(-0.25,0]
(0,0.25]
hp -0.4 0.7 -0.7
(0.25,0.5]
(0.5,0.75]
disp 0.8 -0.7 0.9 -0.4 (0.75,1]
25
20 -0.848 -0.776 0.681 -0.868 0.419
15
400 Corr: Corr: Corr: Corr:
disp
300
200 0.791 -0.71 0.888 -0.434
100
300
Corr: Corr: Corr:
hp
200
100 -0.449 0.659 -0.708
5.0
4.5 Corr: Corr:
drat
4.0
3.5 -0.712 0.0912
3.0
5
4 Corr:
wt
3
2
-0.175
22
qsec
20
18
16
101520253035
mpg 100200 300400 100 hp
disp 200 300 3.03.54.04.55.0 2 3wt4 5
drat 16qsec
18 20 22
41.2. GGCORRPLOT 229
41.2 ggcorrplot
install.packages("ggcorrplot")
# Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggcorrplot")
library("ggcorrplot")
# Compute a correlation matrix
mydata <- mtcars[, c(1,3,4,5,6,7)]
corr <- round(cor(mydata), 1)
head(corr[, 1:6], 3)
Corr Corr
qsec 1.0 drat 1.0
wt mpg
0.5 0.5
drat qsec
hp 0.0 wt 0.0
disp -0.5 disp -0.5
mpg hp
-1.0 -1.0
di g
sp
dr p
at
qs t
ec
di p
sp
qs wt
m c
dr g
at
w
e
p
p
m
# Argument p.mat
# Barring the no significant coefficient
ggcorrplot(corr, hc.order = TRUE,
type = "lower", p.mat = p.mat)
Corr Corr
mpg 0.7 mpg 1.0
1.0
qsec 0.4 0.1 qsec 0.5
0.5
wt -0.2-0.9-0.7 wt 0.0
0.0
disp
disp 0.9 -0.4-0.8-0.7
-0.5
-0.5
hp
hp 0.8 0.7 -0.7-0.8-0.4 -1.0
-1.0
sp
qs t
ec
pg
at
sp
qs t
ec
pg
at
w
w
dr
dr
di
di
m
m
Chapter 42
Here, we developed and present the survminer R package for drawing survival
curves using ggplot2 system.
install.packages("survival")
232
42.2. DRAWING SURVIVAL CURVES WITH SURVMINER 233
library("survival")
fit <- survfit(Surv(time, status) ~ sex, data = lung)
install.packages(survminer)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/survminer")
Loading
library("survminer")
234 CHAPTER 42. PLOTTING SURVIVAL CURVES
1.00
++
++
Survival probability
0.75 +++++
+ ++++
+++ ++
++
0.50 ++ ++++
++
+ +++
+++
+++
0.25 +
+
+++ +
++
0.00
0 250 500 750 1000
Time
1.00
++
++
Survival probability
0.75 +++++
+ ++++
+++ ++
++ +
0.50 ++ +++
+++
+ +
+++++
+++
0.25 +
p = 0.0013
+
+++ +
++
0.00
0 250 500 750 1000
Time
Number at risk by time
Strata
sex=1 138 62 20 7 2
sex=2 90 53 21 3 0
0 250 500 750 1000
Time
Note that, allowed values, for the argument palette, include "hue" for the default
hue color scale; "grey" for grey color palettes; brewer palettes e.g. "RdBu", "Blues",
"Dark2", ...; or custom color palette e.g. c("blue", "red").
236
43.2. CHEAT SHEETS 237
43.3 References
This book was written in R Markdown inside RStudio. knitr and pandoc converted
the raw Rmarkdown to pdf. This version of the book was built with R (ver. x86_64-
apple-darwin13.4.0, x86_64, darwin13.4.0, x86_64, darwin13.4.0, , 3, 2.3, 2015, 12,
10, 69752, R, R version 3.2.3 (2015-12-10), Wooden Christmas-Tree), ggplot2 (ver.
2.1.0) and dplyr (ver. 0.4.3)
Hadley Wickman. Elegant graphics for data analysis. Springer 2009. http:
//ggplot2.org/book/
Hadley Wickman. ggplot2 official documentation. http://docs.ggplot2.org/
current/
Winston Chang. R graphics cookbook. OReilly 2012. http://www.cookbook-r.
com/
Alboukadel Kassambara. Data analysis and visualization. http://www.sthda.
com/english/wiki/ggplot2-introduction