0% found this document useful (0 votes)
4 views

Animated Gapminder Code(2)

Uploaded by

kokicharity1
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Animated Gapminder Code(2)

Uploaded by

kokicharity1
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

STEP-BY-STEP TUTORIAL

How to Create Animated Plots in R


Animating a Time Series Data using the gganimate R package

A picture is worth a thousand words and so does the insights


provided by graphs and plots. Data visualization is such an
important part of any data science project as it allows
effective data storytelling in the form of graphs and plots.
Even static plots can convey important information and
provide immense value, imagine what an animated plot can
do to highlight particular aspects of a plot.

Hans Rosling’s animated plot of the Gapminder data (for


which he is the founder of) at his TED talks has captivated us
all as it brings data to life.

In this article, you will learn how to create a stunning


animated plot in R using ggplot together with the gganimate R
packages for a time series dataset.

Watch the accompanying YouTube


video: https://youtu.be/z9J78rxhcrQ

1. The Animated Plot that we are Building Today


Today, we’re going to build an animated scatter plot of the
Gapminder dataset. Particularly, you will see that the plot
is faceted (separated into distinct sub-plots) by the
continents instead of having them all in the same plot (which
can be quite messy).

The animated plot will be build using ggplot2 and gganimate R


packages.

Shown below is the animated plot that we are building today


(Source: gganimate).

Animated scatter plot of the Gapminder dataset created using


ggplot and gganimate.

2. Coding Environment
Now, fire up your IDE of choice whether it be RStudio,
Kaggle Notebooks or a plain old R terminal. Within this
coding environment you will be typing in the codes
mentioned hereafter.

My personal favorite for coding in R would have to be using


the RStudio IDE, which is free and open source.

3. Installing Prerequisite R Packages

In this tutorial, we’re using 4 R packages


including gapminder, ggplot2, gganimate and gifski.

To install these R packages, type the following into an R


terminal (whether it be directly into an R terminal, in an R
terminal from within the RStudio or in a code cell of a Kaggle
Notebook.
install.packages(c('gapminder','ggplot2','gganimate','gifski'))

Let’s now take a look at why we’re using the above R


packages.

 gapmindercontains an excerpt of the Gapminder time


series dataset that we are using in this tutorial.

 ggplot2allows us to create awesome data


visualizations namely the scatter plot
 gganimate allows us to add animation to the plots

 allows us to render the animation as a GIF file


gifski
format (GIF is a popular image format for animated
images).

4. Exploring the Gapminder dataset

Prior to our data visualization, let’s have a look at the


Gapminder dataset.

Here, we will start by loading the gapminder package and


return the contents of the gapminder variable.

Screenshot of the gapminder dataset.


Here, we can see that the data is a tibble (tidyverse’s
implementation of a data frame) consisting of 1,704 rows and
6 columns.

These 6 columns consists of:

 country — Names of the countries

 continent — Names of the continents

 year — Year of the data entry

 lifeExp — Life expectancy for the given year

 pop — Population count for the given year

 gdpPercap — Per capita GDP for the given year

5. Creating the Static Scatter Plot

In this section, we will create a static version of the scatter


plot that can be used as the baseline for comparison with the
animated version.

5.1. Code

The code for creating the scatter plot is shown below:

A screenshot of how I’m implementing the code in an


RStudio:
Screenshot of implementing the code in an RStudio IDE.

5.2 Line-by-Line Explanation

 Line 1 — The ggplot() function is used for creating


plots using the ggplot2 R package. The first input
argument defines the input data that is stored in
the gapminder variable. The aes() function allows
aesthetic mapping of the input variables by defining
the use of gdpPercap to be displayed on the X axis while
defining lifeExp to be displayed on the Y axis. The size
of each data point will now be dependent on
the pop variable (the larger the pop value becomes the
larger the data point also becomes). Finally, the color
(particularly, the colour parameter) of the data points
will be a function of the country for which it belongs
to.
 Line 2 — geom_point() is used to define the alpha
transparency (i.e. the data point will be translucent
as defined by the alpha parameter of 0.7; the lower
the value the more translucent they become) of each
data point (i.e. the circles that we see on the plot). As
implied, show.legend=FALSE will hide the legend.

 Line 3 — scale_colour_manual() function defines the


color scheme stored in the country_colors variable that
will be used for coloring data points according to the
countries.

 Line 4 — scale_size() function defines the size range


of the data points (i.e. recall that on Line 1 we
defined in the aes() function that size=pop) to be in the
range of 2 and 12 (i.e. 2 being small data points while
12 represents the largest data points).

 Line 5 — scale_x_log10() function logarithmically


transforms the data in the X axis via log10.

 Line 6 — facet_wrap() function splits the plot to


multiple sub-plots (i.e. this process is also known
as facet) by using the continent variable.

 Line 7—labs() function defines the plot title, X axis


title and Y axis title.

5.3. Saving the Plot to File

From the above screenshot we can see that the plot is shown
in the Plots panel (lower left panel) but is not saved to a file.
To save the plot to a file, we will use the ggsave() function as
follows:
ggsave('plot_gdpPercap_lifeExp_static.png', width=8, height=8)

This produces the resulting plot:

6. Creating the Animated Scatter Plot


Here comes the fun part, let’s now proceed to creating the
animated scatter plot using the gganimate R package.

6.1. Code

The above code generates the following animated plot:

6.2. Line-by-Line Explanation

 Lines 1–6 — Explanation is the same as that of the


static plot and thus please refer to the explanation in
section 5.2.
 Line 7 — Commented text to hint that the lines that
follows pertains to the animation component of the
plot.

 Line 8 — labs() function defines the plot title, X axis


title and Y axis title. {frame_time} will dynamically
display the changing years as the data points move
across the plot.

 Line 9 — transition_time() function takes in


the year variable as an input and it allows the
animated plot to transition frame by frame as a
function of the year variable.

 Line 10 — ease_aes() function takes in linear as an


input argument and it defines the transition of the
frame to be in a linear fashion.

 As we can see Lines 1–10 are assigned to


the p1 variable

 Line 12 — animate() function takes in the plot defined


in the p1 variable as the input argument and performs
rendering of the animation.

 Line 13 — anim_save() function allows saving the


rendered animated plot to a .GIF file.

7. Customizing the Animated Plot

So how do you customize this animated plot?


In this section, we will explore which parameters we can
adjust to further customize your animated plot.

7.1. Changing the input variables

So instead of making a scatter plot for gdpPercap and lifeExp (in


the X and Y axes), we can also consider other columns in
the gapminder dataset.

Let’s say that we would like to use these 2


variables: pop and lifeExp, we can define this within
the ggplot() function.
ggplot(gapminder, aes(pop, lifeExp, size = pop, colour = country))

Notice that in the above code we’re using pop and lifeExp as
the first and second input arguments (as compared to
using gdpPercap and lifeExp in the first plot).

The resulting plot is as follows:


7.2. Changing the sub-plot layout

By default, we can see that the animated plot is arranged as


a 2 row × 3 columns layout.

7.2.1. Horizontal layout

What if we want to change the layout to perhaps 1 row × 5


columns, how can we do that?
Go to Section 6.1 to the line of code
containing facet_wrap(~continent) (Hint: Line 6) and
add ncol=5 as an additional input argument so that it
becomes facet_wrap(~continent, ncol=5).

Notice that the layout is updated as desired to be on 1 row


and 5 columns. But a new problem emerges, the width of the
sub-plot looks a bit too narrow.

A solution to this is to make adjustments to the figure’s width


and height. This can be done by adding 2 additional input
arguments to the ggsave() function as follows:
anim_save('plot_pop_lifeExp_wide.gif', width=1600, height=400)

This should now give the following plot:

7.2.2. Vertical layout

In a similar fashion, let’s now create a new plot having a


vertical layout.

We can do this by simply changing the assigned value of


the ncol input argument of the facet_wrap() function.

Earlier, the horizontal layout had:

facet_wrap(~continent, ncol=5).

Now, for the vertical layout we have:

facet_wrap(~continent, ncol=1).
7.3. Adjusting the font size

You may notice that font sizes for the X/Y axes and tick labels
may be small and you would like to adjust it. Let me show
you how.

Here’s an example code for adjusting the font properties


mentioned above.

You will notice that we have added Lines 7–13 which makes
use of the theme() function to adjust the font sizes, faces and
colors.

Particularly, plot.title allows us to adjust the font for the


plot’s title (which in our case is the Year label shown at the
top left hand side of the plot. Here, we used a font size of 20,
a font face of bold and a color of black. Similar adjustments
also applies for the X and Y axes title
(axis.title.x and axis.title.y, respectively), the X and Y tick
labels (axis.text.x and axis.text.y, respectively), the facet sub-
plot label (strip.text.x) and the white space around the plot
image (plot.margin).

The updated plot is shown (top image) while for comparative


purpose we will show the original plot (bottom image) that
we made earlier (using the default font size). The bigger font
size does indeed help to provide better readability.
Animated plot using bigger font sizes.

Animated plot using the default font size.

Conclusion

In this summary, you have successfully created an animated


scatter plot for a time series dataset (Gapminder) as well as
learning how to make adjustments to the plot.

What next? You can also experiment with making animated


plots for other time series data such as visualizing price
information of Cryptocurrencies, Air Quality indices, etc. Let
me know in the comments, for which datasets are you
creating animated plots for.

Aside from the animated plot for time series data, you can
also experiment with the gganimate R package to spice up
other data visualization and add animation to it such as box
plots

Subscribe to my Mailing List for my best updates (and


occasionally freebies) in Data Science!

About Me: Chanin Nantasenamat

I work full-time as an Associate Professor of Bioinformatics


and Head of Data Mining and Biomedical Informatics at a
Research University in Thailand. In my after work hours, I’m
a YouTuber (AKA the Data Professor) making online videos
about data science. In all tutorial videos that I make, I also
share Jupyter notebooks on GitHub (Data Professor GitHub
page).

You might also like