Full Download Efficient R Programming A Practical Guide To Smarter Programming 1st Edition Colin Gillespie PDF
Full Download Efficient R Programming A Practical Guide To Smarter Programming 1st Edition Colin Gillespie PDF
com
https://textbookfull.com/product/efficient-r-
programming-a-practical-guide-to-smarter-
programming-1st-edition-colin-gillespie/
https://textbookfull.com/product/practical-and-efficient-sas-
programming-the-insider-s-guide-1st-edition-martha-messineo/
textbookfull.com
https://textbookfull.com/product/impractical-python-projects-playful-
programming-activities-to-make-you-smarter-1st-edition-lee-vaughan/
textbookfull.com
https://textbookfull.com/product/impractical-python-projects-playful-
programming-activities-to-make-you-smarter-1st-edition-lee-vaughan-2/
textbookfull.com
https://textbookfull.com/product/solidworks-2018-for-designers-16th-
edition-sham-tickooo/
textbookfull.com
Essential of Oral Biology oral anatomy histology
physiology embryology 2nd Edition Maji Jose
https://textbookfull.com/product/essential-of-oral-biology-oral-
anatomy-histology-physiology-embryology-2nd-edition-maji-jose/
textbookfull.com
https://textbookfull.com/product/the-society-for-the-oversea-
settlement-of-british-women-1919-1964-bonnie-white/
textbookfull.com
https://textbookfull.com/product/ecotoxicology-and-genotoxicology-non-
traditional-aquatic-models-marcelo-l-larramendy/
textbookfull.com
https://textbookfull.com/product/the-confabulating-mind-how-the-brain-
creates-reality-2nd-edition-armin-schnider/
textbookfull.com
https://textbookfull.com/product/basic-management-accounting-for-the-
hospitality-industry-michael-chibili/
textbookfull.com
LaunchPad for Fundamentals of Abnormal Psychology Ronald
J. Comer
https://textbookfull.com/product/launchpad-for-fundamentals-of-
abnormal-psychology-ronald-j-comer/
textbookfull.com
Efficient R programming
Colin Gillespie and Robin Lovelace
2016-06-03
2
Contents
Preface 9
1 Introduction 11
1.1 Who this book is for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 What is efficiency? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Why efficiency? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 What is efficient R programming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Touch typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Efficient set-up 17
2.1 Top 5 tips for an efficient R set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 R version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 R startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 BLAS and alternative R interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Efficient programming 39
3.1 General advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Communicating with the user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 S3 objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Caching variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 The byte compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3
4 CONTENTS
4 Efficient workflow 57
4.7 Publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Efficient visualisation 83
7 Efficient performance 85
7.4 Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
This is the online home of the O’Reilly book: Efficient R programming. Pull requests and general comments
are welcome.
To build the book:
devtools::install_github("csgillespie/efficientR")
Package Dependencies
7
8 CONTENTS
Name Title
assertive.reflection Assertions for Checking the State of R
benchmarkme Crowd Sourced System Benchmarks
bookdown Authoring Books with R Markdown
cranlogs Download Logs from the ’RStudio’ ’CRAN’ Mirror
data.table Extension of Data.frame
devtools Tools to Make Developing R Packages Easier
DiagrammeR Create Graph Diagrams and Flowcharts Using R
dplyr A Grammar of Data Manipulation
drat Drat R Archive Template
efficient Becoming an Efficient R Programmer
formatR Format R Code Automatically
fortunes R Fortunes
geosphere Spherical Trigonometry
ggplot2 An Implementation of the Grammar of Graphics
ggplot2movies Movies Data
knitr A General-Purpose Package for Dynamic Report Generation in R
lubridate Make Dealing with Dates a Little Easier
microbenchmark Accurate Timing Functions
profvis Interactive Visualizations for Profiling R Code
pryr Tools for Computing on the Language
readr Read Tabular Data
tidyr Easily Tidy Data with ‘spread()‘ and ‘gather()‘ Functions
Preface
Efficient R Programming is about increasing the amount of work you can do with R in a given amount of
time. It’s about both computational and programmer efficiency. There are many excellent R resources about
topic areas such as visualisation (e.g. Chang 2012), data science (e.g. Grolemund and Wickham 2016) and
package development (e.g. Wickham 2015). There are even more resources on how to use R in particular
domains, including Bayesian Statistics, Machine Learning and Geographic Information Systems. However,
there are very few unified resources on how to simply make R work effectively. Hints, tips and decades of
community knowledge on the subject are scattered across hundreds of internet pages, email threads and
discussion forums, making it challenging for R users to understand how to write efficient code.
In our teaching we have found that this issue applies to beginners and experienced users alike. Whether it’s
a question of understanding how to use R’s vector objects to avoid for loops, knowing how to set-up your
.Rprofile and .Renviron files or the ability to harness R’s excellent C++ interface to do the ‘heavy lifting’,
the concept of efficiency is key. The book aims to distill tips, warnings and ‘tricks of the trade’ down into a
single, cohesive whole that will provide a useful resource to R programmers of all stripes for years to come.
The content of the book reflects the questions that our students, from a range of disciplines, skill levels and
industries, have asked over the years to make their R work faster. How to set-up my system optimally for R
programming work? How can one apply general principles from Computer Science (such as do not repeat
yourself, DRY) to the specifics of an R script? How can R code be incorporated into an efficient workflow,
including project inception, collaboration and write-up? And how can one learn quickly how to use new
packages and functions?
The book answers each of these questions, and more, in 10 self-contained chapters. Each chapter starts simple
and gets progressively more advanced, so there is something for everyone in each. While the more advanced
topics such as parallel programming and C++ may not be immediately relevant to R beginners, the book
helps to navigate R’s famously steep learning curve with a commitment to starting slow and building on
strong foundations. Thus even experienced R users are likely to find previously hidden gems of advice in the
early parts of the chapters. “Why did no one tell me that before?” is a common exclamation we have heard
while teaching this material.
Efficient programming should not be seen as an optional extra and the importance of efficiency grows with
the size of projects and datasets. In fact, this book was devised while we were teaching a course on ‘R for
Big Data’: it quickly became apparent that if you want to work with large datasets, your code must work
efficiently. Even if you work with small datasets, efficient code, that is both fast to write and run is a vital
component of successful R projects. We found that the concept of efficient programming is important to
all branches of the R community. Whether you are a sporadic user of R (e.g. for its unbeatable range of
statistical packages), looking to develop a package, or working on a large collaborative project in which
efficiency is mission-critical, code efficiency will have a major impact on your productivity.
Ultimately efficiency is about getting more output for less work input. To take the analogy of a car, would
you rather drive 1000 km on a single tank (or a single charge of your batteries) or refuel a heavy, clunky and
ugly car every 50 km? In the same way, efficient R code is better than inefficient R code in almost every way:
it is easier to read, write, run, share and maintain. This book cannot provide all the answers about how to
produce such code but it certainly can provide ideas, example code and tips to make a start in the right
direction of travel.
9
10 CONTENTS
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Chapter 1
Introduction
• For programmers with little R knowledge this book will help you navigate the quirks of R to
make it work efficiently: it is easy to write slow R code if you treat as if were another language.
• For R users who have little experience of programming this book will show you many concepts
and ‘tricks of the trade’, some of which are borrowed from Computer Science, that will make your work
more time effective.
• A R beginner, you should probably read this book in parallel with other R resources such as the
numerous, vignettes, tutorials and online articles that the R community has produced. At a bare
minimum you should have R installed on your computer (see section 2.3 for information on how best to
install R on new computers).
W
η=
Q
In the context of computer programming efficiency can be defined narrowly or broadly. The narrow sense,
algorithmic efficiency refers to the way a particular task is undertaken. This concept dates back to the very
origins of computing, as illustrated by the following quote by Lovelace (1842) in her notes on the work of
Charles Babbage, one of the pioneers of early computing:
In almost every computation a great variety of arrangements for the succession of the processes is
possible, and various considerations must influence the selections amongst them for the purposes
11
12 CHAPTER 1. INTRODUCTION
of a calculating engine. One essential object is to choose that arrangement which shall tend to
reduce to a minimum the time necessary for completing the calculation.
The issue of having a ‘great variety’ of ways to solve a problem has not gone away with the invention of
advanced computer languages: R is notorious for allowing users to solve problems in many ways, and this
notoriety has only grown with the proliferation of community contributed package. In this book we want to
focus on the best way of solving problems, from an efficiency perspective.
The second, broader definition of efficient computing is productivity. This is the amount of useful work a
person (not a computer) can do per unit time. It may be possible to rewrite your codebase in C to make it
100 times faster. But if this takes 100 human hours it may not be worth it. Computers can chug away day
and night. People cannot. Human productivity the subject of Chapter 4.
By the end of this book you should know how to write R code that is efficient from both algorithmic and
productivity perspectives. Efficient code is also concise, elegant and easy to maintain, vital when working on
large projects.
Computers are always getting more powerful. Does this not reduce the need for efficient computing? The
answer is simple: in an age of Big Data and stagnating computer clockspeeds (see Chapter 8), computational
bottlenecks are more likely than ever before to hamper your work. An efficient programmer can “solve more
complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research”
(Visser et al. 2015).
A concrete example illustrates the importance of efficiency in mission critical situations. Robin was working
on a tight contract for the UK’s Department for Transport, to build the Propensity to Cycle Tool, an online
application which had to be ready for national deployment in less than 4 months. To help his workflow he
developed a function, line2route() in the stplanr to batch process calls to the (cyclestreets.net) API. But
after a few thousand routes the code slowed to a standstill. Yet hundreds of thousands were needed. This
endangered the contract. After eliminating internet connection issues, it was found that the slowdown was
due to a bug in line2route(): it suffered from the ‘vector growing problem’, discussed in Section 3.1.1.
The solution was simple. A single commit made line2route() more than ten times faster and substantially
shorter. This potentially saved the project from failure. The moral of this story is that efficient programming
is not merely a desirable skill: it can be essential.
Efficient R programming is the implementation of efficient programming practices in R. All languages are
different, so efficient R code does not look like efficient code in another language. Many packages have been
optimised for performance so, for some operations, acheiving maximum computational efficiency may simply
be a case of selecting the appropriate package and using it correctly. There are many ways to get the same
result in R, and some are very slow. Therefore not writing slow code should be prioritized over writing fast
code.
Returning to the analogy of the two cars sketched in the preface, efficient R programming for some use cases
can simply mean trading in your heavy and gas guzzling hummer for a normal hatchback. The search for
optimal performance often has diminishing returns so it is important to find bottlenecks in your code to
prioritise work for maximum increases in computational efficency.
1.5. TOUCH TYPING 13
Figure 1.1: The starting position for touch typing, with the fingers over the ‘home keys’. Source: Wikipedia
under the Creative Commons license.
1.6 Benchmarking
Benchmarking is the process of testing the performance of specific operations repeatedly. Modifying things
from one benchmark to the next and recording the results after changing things allows experimentation to
14 CHAPTER 1. INTRODUCTION
see which bits of code are fastest. Benchmarking is important in the efficient programmer’s toolkit: you may
think that your code is faster than mine but benchmarking allows you to prove it.
* `system.time()`
* `microbenchmark` and `rbenchmark`
The microbenchmark package runs a test many times (by default 1000), enabling the user to detect
microsecond difference in code performance.
library("microbenchmark")
df = data.frame(v = 1:4, name = c(letters[1:4]))
microbenchmark(
df[3, 2],
df$name[3],
df[3, 'v']
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> df[3, 2] 22.8 24.2 27.6 24.8 25.5 201.3 100
#> df$name[3] 16.2 17.6 19.9 18.9 19.7 62.4 100
#> df[3, "v"] 14.8 15.9 17.0 16.5 17.0 30.0 100
The results show that seemingly arbitrary changes to how R code is written can affect the efficiency of
computation. Without benchmarking, these differences would be very hard to detect.
1.7 Profiling
Benchmarking generally tests execution time of one function against another. Profiling, on the other hand, is
about testing large chunks of code.
It is difficult to over-emphasise the importance of profiling for efficient R programming. Without a profile of
what took longest, you will have only a vague idea of why your code is taking so long to run. The example
below (which generates Figure 1.3 an image of ice-sheet retreat from 1985 to 2015) shows how profiling can
be used to identify bottlenecks in your R scripts:
library("profvis")
profvis(expr = {
Figure 1.2: Profiling results of loading and plotting NASA data on icesheet retreat.
16 CHAPTER 1. INTRODUCTION
Figure 1.3: Visualisation of North Pole icesheet decline, generated using the code profiled using the profvis
package.
Chapter 2
Efficient set-up
An efficient computer set-up is analogous to a well-tuned vehicle: its components work in harmony, it is
well-serviced, and it is fast. This chapter describes the software decisions that will enable a productive
workflow. Starting with the basics and moving to progressively more advanced topics, we explore how the
operating system, R version, startup files and IDE can make your R work faster (though IDE could be seen
as basic need for efficient programming). Ensuring correct configuration of these elements will have knock-on
benefits in many aspects of your R workflow. That’s why we cover them at this early stage (hardware, the
other fundamental consideration, is covered in the next chapter). By the end of this chapter you should
understand how to set-up your computer and R installation (skip to section 2.3 if R is not already installed
on your computer) for optimal computational and programmer efficiency. It covers the following topics:
• R and the operating systems: system monitoring on Linux, Mac and Windows
• R version: how to keep your base R installation and packages up-to-date
• R start-up: how and why to adjust your .Rprofile and .Renviron files
• RStudio: an integrated development environment (IDE) to boost your programming productivity
• BLAS and alternative R interpreters: looks at ways to make R faster
For lazy readers, and to provide a taster of what’s to come, we begin with our ‘top 5’ tips for an efficient R
set-up. It is important to understand that efficient programming is not simply the result of following a recipe
of tips: understanding is vital for knowing when to use a memorised solution to a problem and when to go
back to first principles. Thinking about and understanding R in depth, e.g. by reading this chapter carefully,
will make efficiency second nature in your R workflow.
17
18 CHAPTER 2. EFFICIENT SET-UP
way on each of these platforms. This is partly facilitated by CRAN tests which ensure that R packages work
on all OSs mentioned above. There are some operating system-specific quirks that may influence the choice
of OS and how it is set-up for R programming in the long-term. Basic system information can be queried
from within R using Sys.info(), as illustrated below for a selection its output:
Sys.info()
## sysname
## "Linux"
## release
## "4.2.0-35-generic"
## machine
## "x86_64"
## user
## "robin"
Translated into English, this means that R is running on a 64 bit (x86_64) Linux distribution (kernel version
4.2.0-35-generic) and that the current user is robin. Four other pieces of information (not shown) are
also produced by the command, the meaning of which is well documented in ?Sys.info.
Pro tip. The assertive.reflection package can be used to report additional information about
your computer’s operating system and R set-up with functions for asserting operating system and
other system characteristics. The assert_* functions work by testing the truth of the statement
and erroring if the statement is untrue. On a Linux system assert_is_linux() will run silently,
whereas assert_is_solaris will cause an error. The package can also test for IDE you are using
(e.g. assert_is_rstudio()), the capabilities of R (assert_r_has_libcurl_capability etc.),
and what OS tools are available (e.g. assert_r_can_compile_code). These functions can be
useful for running code that designed only to run on one type of set-up.
Minor differences aside,1 R’s computational efficiency is broadly the same across different operating systems.
This is important as it means the techniques will, in general, work equally well on different OSs. Beyond the
32 vs 64 bit issue (covered in the next chapter) and process forking (covered in Chapter 6) the main issue
for many will be user friendliness and compatibility other programs used alongside R for work. Changing
operating system can be a time consuming process so our advice is usually to stick to whatever OS you are
most comfortable with.
Some packages (e.g. those that must be compiled and that depend on external libraries) are best installed at
the operating system level (i.e. not using install.packages) on Linux systems. On Debian-based operating
systems such as Ubuntu, these are named with the prefix r-cran- (see Section 2.4).
Regardless of your operating system, it is good practice to track how system resources (primarily CPU
and RAM use) respond when running time-consuming or RAM-intensive tasks. If you only process small
datasets, system monitoring may not be necessary but when handling datasets at the limits of your computer’s
resources, it can be a useful tool for identifying bottlenecks, such as when you are running low on RAM.
1 Benchmarking conducted for a presentation “R on Different Platforms” at useR 2006 found that R was marginally faster
on Windows than Linux set-ups. Similar results were reported in an academic paper, with R completing statistical analyses
faster on a Linux than Mac OS’s (Sekhon 2006). In 2015 Revolution R supported these results with slightly faster run times for
certain benchmarks on Ubuntu than Mac systems. The data from the benchmarkme package also suggests that running code
under the Linux OS is faster.
2.2. OPERATING SYSTEM 19
Alongside R profiling functions such as profvis (see Section XXX), system monitoring can help identify
performance bottlenecks and opportunities for making tasks run faster.
A common use case for system monitoring of R processes is to identify how much RAM is being used and
whether more is needed (covered in Chapter 3). System monitors also report the percentage of CPU resource
allocated over time. On modern multi-threaded CPUs, many tasks will use only a fraction of the available
CPU resource because R is by default a single-threaded program (see Chapter 6 on parallel programming).
Monitoring CPU load in this context can be useful for identifying whether R is running in parallel (see Figure
2.1).
Figure 2.1: Output from a system monitor (gnome-system-monitor running on Ubuntu) showing the
resources consumed by running the code presented in the second of the Exercises at the end of this section.
The first increases RAM use, the second is single-threaded and the third is multi-threaded.
System monitoring is a complex topic that spills over into system administration and server management.
Fortunately there are many tools designed to ease monitoring all major operating systems.
• On Linux, the shell command top displays key resource use figures for most distributions. htop and
Gnome’s System Monitor (gnome-system-monitor, see Figure 2.1) are more refined alternatives
which use command-line and graphical user interfaces respectively. A number of options such as nethogs
monitor internet usage.
• On Windows the Task Manager provides key information on RAM and CPU use by process. This
can be started in modern Windows versions by typing Ctrl-Alt-Del or by clicking the task bar and
‘Start Task Manager’.
• On Mac the Activity Monitor provides similar functionality. This can be initiated form the Utilities
folder in Launchpad.
Exercises
3. What do you notice regarding CPU usage, RAM and system time, during and after each of the three
operations?
20 CHAPTER 2. EFFICIENT SET-UP
4. Bonus question: how would the results change depending on operating system?
2.3 R version
It is important to be aware that R is an evolving software project, whose behaviour changes over time. This
applies to an even greater extent to packages, which occassionally change substantially from one release to
the next. For most use cases it we recommend always using the most up-to-date version of R and packages,
so you have the latest code. In some circumstances (e.g. on a production server) you may alternatively want
to use specific versions which have been tested, to ensure stability. Keeping packages up-to-date is desirable
because new code tends to be more efficient, intuitive, robust and feature rich. This section explains how.
Previous R versions can be installed from CRAN’s archive or previous R releases. The binary versions
for all OSs can be found at cran.r-project.org/bin/. To download binary versions for Ubuntu ‘Wily’, for
example, see cran.r-project.org/bin/linux/ubuntu/wily/. To ‘pin’ specific versions of R packages you can
use the packrat package. For more on pinning R versions and R packages see articles on RStudio’s website
Using-Different-Versions-of-R and rstudio.github.io/packrat/.
2.3.1 Installing R
apt-add-repository https://cran.rstudio.com/bin/linux/ubuntu
In the above code cran.rstudio.com is the ‘mirror’ from which r-base and other r- packages can be
installed using the apt system. The following two commands, for example, would install the base R package
(a ‘barebones’ install) and the package rcurl, which has an external dependency:
2.3.2 Updating R
R is a mature and stable language so well-written code in base R should work on most versions. However, it
is important to keep your R version relatively up-to-date, because:
• Bug fixes are introduced in each version, making errors less likely;
2 See
jason-french.com/blog/2013/03/11/installing-r-in-linux/ for more information on installing R on a variety of Linux
distributions.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
2.3. R VERSION 21
• Performance enhancements are made from one version to the next, meaning your code may run faster
in later versions;
• Many R packages only work on recent versions on R.
Release notes with details on each of these issues are hosted at cran.r-project.org/src/base/NEWS. R release
versions have 3 components corresponding to major.minor.patch changes. Generally 2 or 3 patches are
released before the next minor increment - each ‘patch’ is released roughly every 3 months. R 3.2, for example,
has consisted of 3 versions: 3.2.0, 3.2.1 and 3.2.2.
• On Ubuntu-based systems, new versions of R should be automatically detected through the software
management system, and can be installed with apt-get upgrade.
• On Mac, the latest version should be installed by the user from the .pkg files mentioned above.
For information about changes to expect in the next version, you can subscribe to the R’s NEWS RSS feed:
developer.r-project.org/blosxom.cgi/R-devel/NEWS/index.rss. It’s a good way of keeping up-to-date.
Large projects may need several packages to be installed. In this case, the required packages can be installed
at once. Using the example of packages for handling spatial data, this can be done quickly and concisely with
the following code:
In the above code all the required packages are installed with two not three lines, reducing typing. Note that
we can now re-use the pkgs object to load them all:
In the above code library(pkg[i]) is executed for every package stored in the text string vector. We use
library here instead of require because the former produces an error if the package is not available.
Loading all packages at the beginning of a script is good practice as it ensures all dependencies have been
installed before time is spent executing code. Storing package names in a character vector object such as
pkgs is also useful because it allows us to refer back to them again and again.
Some packages have external dependencies (i.e. they call libraries outside R). On Unix-like systems, these
are best installed onto the operating system, bypassing install.packages. This will ensure the necessary
dependencies are installed and setup correctly alongside the R package. On Debian-based distributions such
as Ubuntu, for example, packages with names starting with r-cran- can be search for and installed as follows
(see cran.r-project.org/bin/linux/ubuntu/ for a list of these):
22 CHAPTER 2. EFFICIENT SET-UP
On Windows the installr package helps manage and update R packages with system-level dependencies. For
example the Rtools package for compiling C/C++ code on Window can be installed with the following
command:
installr::install.rtools()
An efficient R set-up will contain up-to-date packages. This can be done for all packages with:
The default for this function is for the ask argument to be set to TRUE, giving control over what is downloaded
onto your system. This is generally desirable as updating dozens of large packages can consume a large
proportion of available system resources.
To update packages automatically, you can add the line update.packages(ask = FALSE) to your
.Rprofile startup file (see the next section for more on .Rprofile). Thanks to Richard Cotton
for this tip.
An even more interactive method for updating packages in R is provided by RStudio via Tools > Check for
Package Updates. Many such time saving tricks are enabled by RStudio, as described in a subsequent section.
Next (after the exercises) we take a look at how to configure R using start-up files.
Exercises
2.4 R startup
Every time R starts a number of things happen. It can be useful to understand this startup process, so you
can make R work the way you want it, fast. This section explains how.
The arguments passed to the R startup command (typically simply R from a shell environment) determine
what happens. The following arguments are particularly important from an efficiency perspective:
• --no-environ tells R to only look for startup files in the current working directory. (Do not worry if
you don’t understand what this means at present: it will become clear as the later in the section.)
2.4. R STARTUP 23
• --no-restore tells R not to load any .RData files knocking around in the current working directory.
• --no-save tells R not to ask the user if they want to save objects saved in RAM when the session is
ended with q().
Adding each of these will make R load slightly faster, and mean that slightly less user input is needed when
you quit. R’s default setting of loading data from the last session automatically is potentially problematic in
this context. See An Introduction to R, Appendix B, for more startup arguments.
Some of R’s startup arguments can be controlled interactively in RStudio. See the online help file
Customizing RStudio for more on this.
There are two special files, .Renviron and .Rprofile, which determine how R performs for the duration of
the session. These are summarised in the bullet points below we go into more detail on each in the subsequent
sections.
• The primary purpose of .Renviron is to set environment variables. These are settings that relate to the
operating system for telling where to find external programs and the contents of user-specific variables
that other users should not have access to such as API key, small text strings used to verify the user
when interacting web services.
• .Rprofile is a plain text file (which is always called .Rprofile, hence its name) that simply runs
lines of R code every time R starts. If you want R to check for package updates each time it starts (as
explained in the previous section), you simply add the relevant line somewhere in this file.
When R starts (unless it was launched with --no-environ) it first searches for .Renviron and then .Rprofile,
in that order. Although .Renviron is searched for first, we will look at .Rprofile first as it is simpler and
for many set-up tasks more frequently userful. Both files can exist in three directories on your computer.
Confusingly, multiple versions of these files can exist on the same computer, only one of which will be used
per session. Note also that these files should only be changed with caution and if you know what you are
doing. This is because they can make your R version behave differently to other R installations, potentially
reducing the reproducibility of your code.
Files in three folders are important in this process:
• R_HOME, the directory in which R is installed. The etc sub-directory can contain start-up files read
early on in the start-up process. Find out where your R_HOME is with the R.home() command.
• HOME, the user’s home directory. Typically this is /home/username on Unix machines or
C:\Users\username on Windows (since Windows 7). Ask R where your home directory with,
Sys.getenv("HOME").
It is important to know the location of the .Rprofile and .Renviron set-up files that are being used out of
these three options. R only uses one .Rprofile and one .Renviron in any session: if you have a .Rprofile
file in your current project, R will ignore .Rprofile in R_HOME and HOME. Likewise, .Rprofile in HOME
overrides .Rprofile in R_HOME. The same applies to .Renviron: you should remember that adding project
specific environment variables with .Renviron will de-activate other .Renviron files.
To create a project-specific start-up script, simply create a .Rprofile file in the project’s root directory and
start adding R code, e.g. via file.edit(".Rprofile"). Remember that this will make .Rprofile in the
home directory be ignored. The following commands will open your .Rprofile from within an R editor:
Note that editing the .Renviron file in the same locations will have the same effect. The following code will
create a user specific .Renviron file (where API keys and other cross-project environment variables can be
stored), without overwriting any existing file.
The pathological package can help find where .Rprofile and .Renviron files are located on your
system, thanks to the os_path() function. The output of example(startup) is also instructive.
The location, contents and uses of each is outlined in more detail below.
By default, R looks for and runs .Rprofile files in the three locations described above, in a specific order.
.Rprofile files are simply R scripts that run each time R runs and they can be found within R_HOME, HOME
and the project’s home directory, found with getwd(). To check if you have a site-wide .Rprofile, which
will run for all users on start-up, run:
The above code checks for the presence of Rprofile.site in that directory. As outlined above, the .Rprofile
located in your home directory is user-specific. Again, we can test whether this file exists using
file.exists("~/.Rprofile")
We can use R to create and edit .Rprofile (warning: do not overwrite your previous .Rprofile - we suggest
you try project-specific .Rprofile first):
2.4. R STARTUP 25
The example below provides a taster of what goes into .Rprofile. Note that this is simply a usual R script,
but with an unusual name. The best way to understand what is going on is to create this same script, save it
as .Rprofile in your current working directory and then restart your R session to observer what changes.
To restart your R session from within RStudio you can click Session > Restart R or use the keyboard
shortcut Ctrl+Shift+F10.
To quickly explain each line of code: the first simply prints a message in the console each time a new R
session is started. The latter two modify options used to change R’s behavior, first to change the prompt in
the console (set to R> by default) and second to ensure that unwanted factor variables are not created when
read.csv and other functions derived from read.table are used to load external data into R. Note that
simply adding more lines the .Rprofile will set more features. An important aspect of .Rprofile (and
.Renviron) is that each line is run once and only once for each R session. That means that the options set
within .Rprofile can easily be changed during the session. The following command run mid-session, for
example, will return the default prompt:
More details on these, and other potentially useful .Rprofile options are described subsequently. For more
suggestions of useful startup settings, see Examples in help("Startup") and online resources such as those
at statmethods.net. The help pages for R options (accessible with ?options) are also worth a read before
writing you own .Rprofile.
Ever been frustrated by unwanted + symbols that prevent copyied and pasted multi-line functions from
working? These potentially annoying +s can be erradicated by adding options(continue = " ") to your
.Rprofile.
The function options, used above, contains a number of default settings. Typing options() provides a
good indication of what be configured. Because options() are often related to personal preference (with
few implications for reproducibility), that you will want for many your R sessions, .Rprofile in your home
directory or in your project’s folder are sensible places to set them. Other illustrative options are shown
below:
Bataille de
l’Aisne
..... La bataille de l’Aisne prend donc sur une grande partie du front un caractère
de guerre de forteresse analogue aux opérations de Mandchourie.
On peut ajouter que la puissance exceptionnelle du matériel d’artillerie en
présence (artillerie lourde allemande et canons de 75 français) donne une valeur
particulière aux fortifications passagères que les deux adversaires ont établies. Il
s’agit donc de conquérir des lignes de tranchées successives toutes précédées de
défenses accessoires et notamment de réseaux de fil de fer avec mitrailleuses en
caponnière.
Dans ces conditions la progression ne peut être que lente: il arrive très
fréquemment que les attaques ne progressent que de 500 mètres à 1 kilomètre par
jour.
Communiqué officiel du 25
septembre.
26 Septembre 1914.
Pour la Campagne
d’hiver
Parmi toutes les tristesses de cette guerre se cache pourtant une joie: le lien qui
nous unit maintenant aux Français.
Il y eut des jours où, durant la rapide marche en avant allemande, nous craignions
que les armées françaises ne fussent par trop inférieures à leurs adversaires, où
nous croyions que l’Allemagne ne serait battue que sur mer et sur sa frontière
orientale et qu’après la guerre la France ne subsisterait, en tant que puissance, que
grâce à l’aide de ses alliés.
D’avoir eu cette peur, nous devons maintenant lui demander pardon.....
Article paru dans le Times et lu dans les lycées et collèges, le jour de leur
réouverture, sur l’invitation du vice-recteur de l’Académie de Paris.
13 Octobre 1914.
A notre aile gauche, le front prend une extension de plus en plus grande. Des
masses de cavalerie allemande très importantes sont signalées aux environs de Lille,
précédant des éléments ennemis qui font un mouvement dans la région au nord de la
ligne Tourcoing-Armentières.
Communiqué officiel du 6 octobre.
A notre aile gauche, les deux cavaleries opèrent toujours au nord de Lille et La
Bassée.
Communiqué officiel du 9 octobre.
Le Gouvernement belge se
transporte au Havre après
l’occupation d’Anvers par les
troupes allemandes
Monsieur le Président,
Bataille de
l’Yser
[1] Manifeste des 93 intellectuels allemands, visant les crimes de droit commun accomplis par les
troupes allemandes.
2 Novembre 1914.
..... Le nombre toujours croissant des postes confiés, durant ces dernières
semaines, à des officiers allemands, la réception d’armes et de munitions provenant
d’Allemagne, l’accueil fait au Gœben et au Breslau avaient justement alarmé le
Gouvernement de la République
Sur la Ligne de
combat
Sous le feu de l’ennemi, il s’établit entre les chefs et les hommes une intimité
confiante qui, loin d’altérer la discipline, l’ennoblit encore par la conscience éclairée
de la solidarité dans le dévouement et le sacrifice.
..... Et lorsqu’à portée des projectiles, devant un horizon que les éclatements
d’obus couvrent de fumée ou déchirent de lueurs, on voit des paysans tranquilles
pousser leur charrue et ensemencer leur sol, on comprend mieux encore combien
sont impérissables, sur notre vieille terre de France, les provisions d’énergie et de
vitalité.
Lettre adressée par le Président de la République au ministre de la Guerre,
après une visite au front.
15 Novembre 1914.
Bataille des
Flandres
..... Dans les rudes semaines que vous venez de passer vous avez consolidé et
prolongé, par la défense des Flandres, la brillante victoire de la Marne, et, grâce à
l’heureuse impulsion que vous avez su donner autour de vous, tout a conspiré à vous
assurer de nouveaux succès: une parfaite unité de vue dans le commandement, une
solidarité active entre les armées alliées, un judicieux emploi des formations, une
coordination rationnelle des différentes armes; mais, ce qui a plus particulièrement
servi vos nobles desseins, c’est cette incomparable énergie morale qui se dégage de
l’âme française et qui met en mouvement tous les ressorts de l’armée.
Discours prononcé par le Président de la République, le 11 novembre, à
l’occasion de la remise de la médaille militaire au général Joffre.
30 Novembre 1914.
Développement de
l’organisation défensive
Réouverture de la
Bourse
Cours de la Bourse le 7
décembre.
8 Décembre 1914.
Les Mers
libres