0% found this document useful (0 votes)
43 views

R Syntax Comparison::: Cheat Sheet

The document compares the syntax used for common tasks in base R, formulas, and the tidyverse/ggplot2. It shows examples of calculating summary statistics, making plots, and wrangling data using each syntax. While the syntaxes differ, they can all accomplish the same analysis tasks.

Uploaded by

pro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

R Syntax Comparison::: Cheat Sheet

The document compares the syntax used for common tasks in base R, formulas, and the tidyverse/ggplot2. It shows examples of calculating summary statistics, making plots, and wrangling data using each syntax. While the syntaxes differ, they can all accomplish the same analysis tasks.

Uploaded by

pro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

R Syntax Comparison : : CHEAT SHEET

Dollar sign syntax Formula syntax Tidyverse syntax


goal(data$x, data$y) goal(y~x|z, data=data, group=w) data %>% goal(x)
SUMMARY STATISTICS: SUMMARY STATISTICS: SUMMARY STATISTICS:
one continuous variable: one continuous variable: one continuous variable:
mean(mtcars$mpg) mosaic::mean(~mpg, data=mtcars) mtcars %>% dplyr::summarize(mean(mpg))

one categorical variable: one categorical variable: one categorical variable:


table(mtcars$cyl) mosaic::tally(~cyl, data=mtcars) mtcars %>% dplyr::group_by(cyl) %>%
dplyr::summarize(n()) the pipe
two categorical variables: two categorical variables:
table(mtcars$cyl, mtcars$am) mosaic::tally(cyl~am, data=mtcars) two categorical variables:
mtcars %>% dplyr::group_by(cyl, am) %>%
one continuous, one categorical: one continuous, one categorical: dplyr::summarize(n())
mean(mtcars$mpg[mtcars$cyl==4]) mosaic::mean(mpg~cyl, data=mtcars)
mean(mtcars$mpg[mtcars$cyl==6]) one continuous, one categorical:
mean(mtcars$mpg[mtcars$cyl==8]) mtcars %>% dplyr::group_by(cyl) %>%
tilde
dplyr::summarize(mean(mpg))

PLOTTING: PLOTTING: PLOTTING:


one continuous variable: one continuous variable: one continuous variable:
hist(mtcars$disp) lattice::histogram(~disp, data=mtcars) ggplot2::qplot(x=mpg, data=mtcars, geom = "histogram")

boxplot(mtcars$disp) lattice::bwplot(~disp, data=mtcars) ggplot2::qplot(y=disp, x=1, data=mtcars, geom="boxplot")

one categorical variable: one categorical variable: one categorical variable:


barplot(table(mtcars$cyl)) mosaic::bargraph(~cyl, data=mtcars) ggplot2::qplot(x=cyl, data=mtcars, geom="bar")

two continuous variables: two continuous variables: two continuous variables:


plot(mtcars$disp, mtcars$mpg) lattice::xyplot(mpg~disp, data=mtcars) ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")

two categorical variables: two categorical variables: two categorical variables:


mosaicplot(table(mtcars$am, mtcars$cyl)) mosaic::bargraph(~am, data=mtcars, group=cyl) ggplot2::qplot(x=factor(cyl), data=mtcars, geom="bar") +
facet_grid(.~am)
one continuous, one categorical: one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==4]) lattice::histogram(~disp|cyl, data=mtcars) one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==6]) ggplot2::qplot(x=disp, data=mtcars, geom = "histogram") +
histogram(mtcars$disp[mtcars$cyl==8]) lattice::bwplot(cyl~disp, data=mtcars) facet_grid(.~cyl)

boxplot(mtcars$disp[mtcars$cyl==4]) ggplot2::qplot(y=disp, x=factor(cyl), data=mtcars,

The variety of R syntaxes give


boxplot(mtcars$disp[mtcars$cyl==6]) geom="boxplot")
boxplot(mtcars$disp[mtcars$cyl==8])

WRANGLING: you many ways to “say” the WRANGLING:


subsetting:
mtcars[mtcars$mpg>30, ]
same thing subsetting:
mtcars %>% dplyr::filter(mpg>30)

making a new variable: making a new variable:


read across the cheatsheet to see how different
mtcars$efficient[mtcars$mpg>30] <- TRUE mtcars <- mtcars %>%
syntaxes approach the same problem
mtcars$efficient[mtcars$mpg<30] <- FALSE dplyr::mutate(efficient = if_else(mpg>30, TRUE, FALSE))
RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01
R Syntax Comparison : : CHEAT SHEET
Syntax is the set of rules that govern what code works and
doesn’t work in a programming language. Most programming
Even more ways to say the same thing
Even within one syntax, there are often variations that are equally valid. As a case study, let’s look at the ggplot2
languages offer one standardized syntax, but R allows package syntax. ggplot2 is the plotting package that lives within the tidyverse. If you read down this column, all the code
developers to specify their own syntax. As a result, there is a large
variety of (equally valid) R syntaxes. here produces the same graphic.

The three most prevalent R syntaxes are: quickplot


1. The dollar sign syntax, sometimes called base R

syntax that look different but produce the same graphic


qplot() stands for quickplot, and allows you to make quick plots. It doesn’t have the full power of ggplot2,
syntax, expected by most base R functions. It is

read down this column for many pieces of code in one


characterized by the use of dataset$variablename, and and it uses a slightly different syntax than the rest of the package.
is also associated with square bracket subsetting, as in
dataset[1,2]. Almost all R functions will accept things ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")
passed to them in dollar sign syntax.
2. The formula syntax, used by modeling functions like
lm(), lattice graphics, and mosaic summary statistics. It ggplot2::qplot(x=disp, y=mpg, data=mtcars) !
uses the tilde (~) to connect a response variable and one (or
many) predictors. Many base R functions will accept formula
syntax. ggplot2::qplot(disp, mpg, data=mtcars) ! !
3. The tidyverse syntax used by dplyr, tidyr, and
more. These functions expect data to be the first argument,
which allows them to work with the “pipe” (%>%) from the
magrittr package. Typically, ggplot2 is thought of as part ggplot
of the tidyverse, although it has its own flavor of the syntax
using plus signs (+) to string pieces together. ggplot2 author To unlock the power of ggplot2, you need to use the ggplot() function (which sets up a plotting region) and
Hadley Wickham has said the package would have had add geoms to the plot.
different syntax if he had written it after learning about the
pipe. ggplot2::ggplot(mtcars) +
Educators often try to teach within one unified syntax, but most R geom_point(aes(x=disp, y=mpg))
programmers use some combination of all the syntaxes.
ggplot2::ggplot(data=mtcars) + plus adds
geom_point(mapping=aes(x=disp, y=mpg)) layers

Internet research tip: ggplot2::ggplot(mtcars, aes(x=disp, y=mpg)) +


geom_point()
If you are searching on google, StackOverflow, or
another favorite online source and see code in a syntax
you don’t recognize: ggplot2::ggplot(mtcars, aes(x=disp)) +
• Check to see if the code is using one of the three geom_point(aes(y=mpg))
common syntaxes listed on this cheatsheet
• Try your search again, using a keyword from the ggformula
syntax name (“tidyverse”) or a relevant package
(“mosaic”) The “third and a half way” to use the formula syntax, but get ggplot2-style graphics

ggformula::gf_point(mpg~disp, data= mtcars)

! Sometimes particular syntaxes work, but are considered formulas in base plots
dangerous to use, because they are so easy to get wrong. For Base R plots will also take the formula syntax, although it's not as commonly used
example, passing variable names without assigning them to a
named argument. plot(mpg~disp, data=mtcars)

RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01

You might also like