ggplot2 Elegant Graphics For Data Analysis Second Edition Hadley Wickham download
ggplot2 Elegant Graphics For Data Analysis Second Edition Hadley Wickham download
https://textbookfull.com/product/ggplot2-elegant-graphics-for-
data-analysis-second-edition-hadley-wickham/
https://textbookfull.com/product/ggplot2-elegant-graphics-for-
data-analysis-second-edition-hadley-wickham-carson-sievert/
https://textbookfull.com/product/advanced-r-second-edition-
hadley-wickham/
https://textbookfull.com/product/visualizing-data-
in-r-4-graphics-using-the-base-graphics-stats-and-
ggplot2-packages-1st-edition-margot-tollefson/
https://textbookfull.com/product/r-cookbook-proven-recipes-for-
data-analysis-statistics-and-graphics-jd-long/
Using R and RStudio for Data Management Statistical
Analysis and Graphics 2nd Edition Nicholas J. Horton
https://textbookfull.com/product/using-r-and-rstudio-for-data-
management-statistical-analysis-and-graphics-2nd-edition-
nicholas-j-horton/
https://textbookfull.com/product/fluid-simulation-for-computer-
graphics-second-edition-bridson/
https://textbookfull.com/product/analysis-of-binary-data-second-
edition-david-roxbee-cox/
https://textbookfull.com/product/r-graphics-cookbook-practical-
recipes-for-visualizing-data-2nd-edition-winston-chang/
https://textbookfull.com/product/r-in-action-data-analysis-and-
graphics-with-r-bonus-ch-23-only-2nd-edition-robert-kabacoff/
Hadley Wickham
ggplot2
Elegant Graphics for Data Analysis
Springer
To my parents, Alison & Brian Wickham.
Without them and their unconditional love
and support, none of this would’ve been
possible.
Preface
Welcome to the second edition of “ggplot2: elegant graphics for data analysis”.
I’m so excited to have an updated book that shows off all the latest and
greatest ggplot2 features, as well as the great things that have been happening
in R and in the ggplot2 community the last five years. The ggplot2 community
is vibrant: the ggplot2 mailing list has over 7,000 members and there is a very
active Stack Overflow community, with nearly 10,000 questions tagged with
ggplot2. While most of my development effort is no longer going into ggplot2
(more on that below), there’s never been a better time to learn it and use it.
I am tremendously grateful for the success of ggplot2. It’s one of the most
commonly downloaded R packages (over a million downloads in the last year!)
and has influenced the design of graphics packages for other languages. Per-
sonally, ggplot2 has bought me many exciting opportunities to travel the
world and meet interesting people. I love hearing how people are using R and
ggplot2 to understand the data that they love.
A big thanks for this edition goes to Carson Sievert, who helped me mod-
ernise the code, including converting the sources to Rmarkdown. He also
updated many of the examples and helped me proofread the book.
Major changes
I’ve spent a lot of effort ensuring that this edition is a true upgrade over the
first edition. As well as updating the code everywhere to make sure it’s fully
compatible with the latest version of ggplot2, I have:
• Shown much more code in the book, so it’s easier to use as a reference.
Overall the book has a more “knitr”-ish sensibility: there are fewer floating
figures and tables, and more inline code. This makes the layout a little less
pretty but keeps related items closer together. You can find the complete
source online at https://github.com/hadley/ggplot2-book.
ix
x Preface
The future
ggplot2 is now stable, and is unlikely to change much in the future. There will
be bug fixes and there may be new geoms, but there will be no large changes to
how ggplot2 works. The next iteration of ggplot2 is ggvis. ggvis is significantly
more ambitious because it aims to provide a grammar of interactive graphics.
ggvis is still young, and lacks many of the features of ggplot2 (most notably
it currently lacks facetting and has no way to make static graphics), but over
the coming years the goal is for ggvis to be better than ggplot2.
The syntax of ggvis is a little different to ggplot2. You won’t be able
to trivially convert your ggplot2 plots to ggvis, but we think the cost is
worth it: the new syntax is considerably more consistent, and will be easier
Preface xi
for newcomers to learn. If you’ve mastered ggplot2, you’ll find your skills
transfer very well to ggvis and after struggling with the syntax for a while, it
will start to feel quite natural. The important skills you learn when mastering
ggplot2 are not the programmatic details of describing a plot in code, but
the much harder challenge of thinking about how to turn data into effective
visualisations.
Acknowledgements
Many people have contributed to this book with high-level structural in-
sights, spelling and grammar corrections and bug reports. I’d particularly
like to thank Alexander Forrence, Devin Pastoor, David Robinson, and
Guangchuang Yu, for their detailed technical review of the book.
Many others have contributed over the (now quite long!) lifetime of gg-
plot2. I would like to thank: Leland Wilkinson, for discussions and comments
that cemented my understanding of the grammar; Gabor Grothendieck, for
early helpful comments; Heike Hofmann and Di Cook, for being great advi-
sors and supporting the development of ggplot2 during my PhD; Charlotte
Wickham; the students of stat480 and stat503 at ISU, for trying it out when
it was very young; Debby Swayne, for masses of helpful feedback and advice;
Bob Muenchen, Reinhold Kliegl, Philipp Pagel, Richard Stahlhut, Baptiste
Auguie, Jean-Olivier Irisson, Thierry Onkelinx and the many others who have
read draft versions of the book and given me feedback; and last, but not least,
the members of R-help and the ggplot2 mailing list, for providing the many
interesting and challenging graphics problems that have helped motivate this
book.
Hadley Wickham
September 2015
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Welcome to ggplot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 What is the grammar of graphics? . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 How does ggplot2 fit in with other R graphics? . . . . . . . . . . . . . 6
1.4 About this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Colophon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
xiii
xiv Contents
3 Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Basic plot types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Collective geoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.1 Multiple groups, one aesthetic . . . . . . . . . . . . . . . . . . . . . 48
3.5.2 Different groups on different layers . . . . . . . . . . . . . . . . . . 49
3.5.3 Overriding the default grouping . . . . . . . . . . . . . . . . . . . . 51
3.5.4 Matching aesthetics to graphic objects . . . . . . . . . . . . . . 52
3.5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Surface plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Drawing maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7.1 Vector boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7.2 Point metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.7.3 Raster images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.7.4 Area metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.8 Revealing uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.9 Weighted data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.10 Diamonds data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.11 Displaying distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.11.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.12 Dealing with overplotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.13 Statistical summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.14 Add-on packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7 Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Facetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.1 Facet wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.2.2 Facet grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
xvi Contents
8 Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2 Complete themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.3 Modifying theme components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.4 Theme elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.4.1 Plot elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.4.2 Axis elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.4.3 Legend elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.4.4 Panel elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.4.5 Facetting elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.5 Saving your output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3
4 1 Introduction
base R, if you design a new graphic, it’s composed of raw plot elements like
points and lines, and it’s hard to design new components that combine with
existing plots. In ggplot2, the expressions used to create a new graphic are
composed of higher-level elements like representations of the raw data and
statistical transformations, and can easily be combined with new datasets
and other plots.
This book provides a hands-on introduction to ggplot2 with lots of ex-
ample code and graphics. It also explains the grammar on which ggplot2 is
based. Like other formal systems, ggplot2 is useful even when you don’t un-
derstand the underlying model. However, the more you learn about it, the
more effectively you’ll be able to use ggplot2. This book assumes some basic
familiarity with R, to the level described in the first chapter of Dalgaard’s
Introductory Statistics with R.
This book will introduce you to ggplot2 as a novice, unfamiliar with the
grammar; teach you the basics so that you can re-create plots you are already
familiar with; show you how to use the grammar to create new types of graph-
ics; and eventually turn you into an expert who can build new components
to extend the grammar.
Wilkinson (2005) created the grammar of graphics to describe the deep fea-
tures that underlie all statistical graphics. The grammar of graphics is an
answer to a question: what is a statistical graphic? The layered grammar of
graphics (Wickham 2009) builds on Wilkinson’s grammar, focussing on the
primacy of layers and adapting it for embedding within R. In brief, the gram-
mar tells us that a statistical graphic is a mapping from data to aesthetic
attributes (colour, shape, size) of geometric objects (points, lines, bars). The
plot may also contain statistical transformations of the data and is drawn on
a specific coordinate system. Facetting can be used to generate the same plot
for different subsets of the dataset. It is the combination of these independent
components that make up a graphic.
As the book progresses, the formal grammar will be explained in increasing
detail. The first description of the components follows below. It introduces
some of the terminology that will be used throughout the book and outlines
the basic responsibilities of each component. Don’t worry if it doesn’t all make
sense right away: you will have many more opportunities to learn about the
pieces and how they fit together.
All plots are composed of:
• Data that you want to visualise and a set of aesthetic mappings describ-
ing how variables in the data are mapped to aesthetic attributes that you
can perceive.
1.2 What is the grammar of graphics? 5
• It doesn’t suggest what graphics you should use to answer the questions
you are interested in. While this book endeavours to promote a sensible
process for producing plots of data, the focus of the book is on how to
produce the plots you want, not knowing what plots to produce. For more
advice on this topic, you may want to consult Robbins (2004), Cleveland
(1993), Chambers et al. (1983), and J. W. Tukey (1977).
• It does not describe interactivity: the grammar of graphics describes only
static graphics and there is essentially no benefit to displaying them on
a computer screen as opposed to a piece of paper. ggplot2 can only cre-
ate static graphics, so for dynamic and interactive graphics you will have
to look elsewhere (perhaps at ggvis, described below). Cook and Swayne
(2007) provides an excellent introduction to the interactive graphics pack-
age GGobi. GGobi can be connected to R with the rggobi package (Wick-
ham et al. 2008).
6 1 Introduction
The first chapter, Chapter 2, describes how to quickly get started using gg-
plot2 to make useful graphics. This chapter introduces several important
ggplot2 concepts: geoms, aesthetic mappings and facetting. Chapter 3 dives
into more details, giving you a toolbox designed to solve a wide range of
problems.
Chapter 4 describes the layered grammar of graphics which underlies gg-
plot2. The theory is illustrated in Chapter 5 which demonstrates how to add
additional layers to your plot, exercising full control over the geoms and stats
used within them.
Understanding how scales work is crucial for fine-tuning the perceptual
properties of your plot. Customising scales gives fine control over the exact
appearance of the plot and helps to support the story that you are telling.
Chapter 6 will show you what scales are available, how to adjust their pa-
rameters, and how to control the appearance of axes and legends.
Coordinate systems and facetting control the position of elements of the
plot. These are described in Chapter 7. Facetting is a very powerful graphical
tool as it allows you to rapidly compare different subsets of your data. Dif-
ferent coordinate systems are less commonly needed, but are very important
for certain types of data.
To polish your plots for publication, you will need to learn about the tools
described in Chapter 8. There you will learn about how to control the theming
system of ggplot2 and how to save plots to disk.
The book concludes with four chapters that show how to use ggplot2
as part of a data analysis pipeline. ggplot2 works best when your data is
tidy, so Chapter 9 discusses what that means and how to make your messy
data tidy. Chapter 10 teaches you how to use the dplyr package to perform
the most common data manipulation operations. Chapter 11 shows how to
integrate visualisation and modelling in two useful ways. Duplicated code is
a big inhibitor of flexibility and reduces your ability to respond to changes in
8 1 Introduction
1.5 Installation
To use ggplot2, you must first install it. Make sure you have a recent version
of R (at least version 3.2.0) from http://r-project.org and then run the
following code to download and install ggplot2:
install.packages("ggplot2")
This book teaches you the elements of ggplot2’s grammar and how they fit
together, but it does not document every function in complete detail. You will
need additional documentation as your use of ggplot2 becomes more complex
and varied.
The best resource for specific details of ggplot2 functions and their argu-
ments will always be the built-in documentation. This is accessible online,
http://docs.ggplot2.org/, and from within R using the usual help syntax.
The advantage of the online documentation is that you can see all the exam-
ple plots and navigate between topics more easily.
If you use ggplot2 regularly, it’s a good idea to sign up for the ggplot2
mailing list, http://groups.google.com/group/ggplot2. The list has relatively
low traffic and is very friendly to new users. Another useful resource is stack-
overflow, http://stackoverflow.com. There is an active ggplot2 community on
stackoverflow, and many common questions have already been asked and an-
swered. In either place, you’re much more likely to get help if you create a min-
imal reproducible example. The reprex (https://github.com/jennybc/reprex)
package by Jenny Bryan provides a convenient way to do this, and also in-
clude advice on creating a good example. The more information you provide,
the easier it is for the community to help you.
The number of functions in ggplot2 can be overwhelming, but RStudio
provides some great cheatsheets to jog your memory at http://www.rstudio.
com/resources/cheatsheets/.
Finally, the complete source code for the book is available online at https:
//github.com/hadley/ggplot2-book. This contains the complete text for the
book, as well as all the code and data needed to recreate all the plots.
1.7 Colophon 9
1.7 Colophon
References
Chambers, John, William Cleveland, Beat Kleiner, and Paul Tukey. 1983.
Graphical Methods for Data Analysis. Wadsworth.
Cleveland, William. 1993. Visualizing Data. Hobart Press.
Cook, Dianne, and Deborah F. Swayne. 2007. Interactive and Dynamic
Graphics for Data Analysis: With Examples Using R and GGobi. Springer.
Lemon, Jim, Ben Bolker, Sander Oom, Eduardo Klein, Barry Rowling-
son, Hadley Wickham, Anupam Tyagi, et al. 2008. Plotrix: Various Plotting
Functions.
Meyer, David, Achim Zeileis, and Kurt Hornik. 2006. “The Strucplot
Framework: Visualizing Multi-Way Contingency Tables with Vcd.” Journal
of Statistical Software 17 (3): 1–48. http://www.jstatsoft.org/v17/i03/.
1.7 Colophon 11
2.1 Introduction
The goal of this chapter is to teach you how to produce useful graphics with
ggplot2 as quickly as possible. You’ll learn the basics of ggplot() along with
some useful “recipes” to make the most important plots. ggplot() allows you
to make complex plots with just a few lines of code because it’s based on a
rich underlying theory, the grammar of graphics. Here we’ll skip the theory
and focus on the practice, and in later chapters you’ll learn how to use the
full expressive power of the grammar.
In this chapter you’ll learn:
• About the mpg dataset included with ggplot2, Section 2.2.
• The three key components of every plot: data, aesthetics and geoms, Sec-
tion 2.3.
• How to add additional variables to a plot with aesthetics, Section 2.4.
• How to display additional categorical variables in a plot using small mul-
tiples created by facetting, Section 2.5.
• A variety of different geoms that you can use to create different types of
plots, Section 2.6.
• How to modify the axes, Section 2.7.
• Things you can do with a plot object other than display it, like save it to
disk, Section 2.8.
• qplot(), a handy shortcut for when you just want to quickly bang out a
simple plot without thinking about the grammar at all, Section 2.9.
In this chapter, we’ll mostly use one data set that’s bundled with ggplot2:
mpg. It includes information about the fuel economy of popular car models
13
14 2 Getting started with ggplot2
library(ggplot2)
mpg
#> Source: local data frame [234 x 11]
#>
#> manufacturer model displ year cyl trans drv cty hwy
#> (chr) (chr) (dbl) (int) (int) (chr) (chr) (int) (int)
#> 1 audi a4 1.8 1999 4 auto(l5) f 18 29
#> 2 audi a4 1.8 1999 4 manual(m5) f 21 29
#> 3 audi a4 2.0 2008 4 manual(m6) f 20 31
#> 4 audi a4 2.0 2008 4 auto(av) f 21 30
#> 5 audi a4 2.8 1999 6 auto(l5) f 16 26
#> 6 audi a4 2.8 1999 6 manual(m5) f 18 26
#> .. ... ... ... ... ... ... ... ... ...
#> Variables not shown: fl (chr), class (chr)
This dataset suggests many interesting questions. How are engine size
and fuel economy related? Do certain manufacturers care more about fuel
economy than others? Has fuel economy improved in the last ten years? We
will try to answer some of these questions, and in the process learn how to
create some basic plots with ggplot2.
2.2.1 Exercises
1. List five functions that you could use to get more information about the
mpg dataset.
2. How can you find out what other datasets are included with ggplot2?
3. Apart from the US, most countries use fuel consumption (fuel consumed
over fixed distance) rather than fuel economy (distance travelled with fixed
amount of fuel). How could you convert cty and hwy into the European
standard of l/100km?
2.3 Key components 15
4. Which manufacturer has the most the models in this dataset? Which model
has the most variations? Does your answer change if you remove the re-
dundant specification of drive train (e.g. “pathfinder 4wd”, “a4 quattro”)
from the model name?
●
40
●
●
●
●
● ●
● ●
● ●●
30 ● ● ●
hwy
● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
20 ●
● ● ●
●
●●
●●
●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
● ●● ●●● ● ●
● ●
●
2 3 4 5 6 7
displ
1. Data: mpg.
2. Aesthetic mapping: engine size mapped to x position, fuel economy to y
position.
3. Layer: points.
Pay attention to the structure of this function call: data and aesthetic
mappings are supplied in ggplot(), then layers are added on with +. This is
Another Random Scribd Document
with Unrelated Content
tarry sheep—Fionn Loch—Angling deteriorated—
Good day's angling—The Dubh loch—Three trout
at a cast—Bait fishing for trout—Loch Kernsary—
Char—Char and trout, and pink and white-fleshed
trout, indistinguishable to the taste—Burn fishing
—Best time for trout fishing—Eels—Pike—Their
introduction described by Dr Mackenzie—Re-
introduced in Sir Kenneth's time
Tables.
Index 419
Errata 436
Addenda 436
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com