WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
A B C A B C
Manipulate Cases Manipulate Variables
&
pipes EXTRACT CASES EXTRACT VARIABLES
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x %>% f(y)
its own column case, is in its own row becomes f(x, y) filter(.data, …, .preserve = FALSE) Extract rows pull(.data, var = -1, name = NULL, …) Extract
Summarise Cases w
www
ww that meet logical criteria.
filter(mtcars, mpg > 20) w
www column values as a vector, by name or index.
pull(mtcars, wt)
w
www
Apply summary functions to columns to create a new table of
w
www
ww
rows with duplicate values. select(mtcars, mpg, wt)
summary statistics. Summary functions take vectors as input and distinct(mtcars, gear)
return one value (see back).
relocate(.data, …, .before = NULL, .a er = NULL)
slice(.data, …, .preserve = FALSE) Select rows
w
www
ww
summary function Move columns to new position.
by position. relocate(mtcars, mpg, cyl, .a er = last_col())
slice(mtcars, 10:15)
summarise(.data, …)
w
ww w
www
ww
Compute table of summaries. slice_sample(.data, …, n, prop, weight_by =
summarise(mtcars, avg = mean(mpg)) NULL, replace = FALSE) Randomly select rows. Use these helpers with select() and across()
Use n to select a number of rows and prop to e.g. select(mtcars, mpg:cyl)
count(.data, …, wt = NULL, sort = FALSE, name = select a fraction of rows. contains(match) num_range(prefix, range) :, e.g. mpg:cyl
NULL) Count number of rows in each group slice_sample(mtcars, n = 5, replace = TRUE) ends_with(match) all_of(x)/any_of(x, …, vars) -, e.g, -gear
w
ww
defined by the variables in … Also tally(). starts_with(match) matches(match) everything()
count(mtcars, cyl) slice_min(.data, order_by, …, n, prop,
with_ties = TRUE) and slice_max() Select rows
with the lowest and highest values.
MANIPULATE MULTIPLE VARIABLES AT ONCE
Group Cases w
www
ww
slice_min(mtcars, mpg, prop = 0.25)
across(.cols, .funs, …, .names = NULL) Summarise
slice_head(.data, …, n, prop) and slice_tail()
w
ww
Use group_by(.data, …, .add = FALSE, .drop = TRUE) to create a or mutate multiple columns in the same way.
Select the first or last rows. summarise(mtcars, across(everything(), mean))
"grouped" copy of a table grouped by columns in ... dplyr slice_head(mtcars, n = 5)
functions will manipulate each "group" separately and combine
the results. c_across(.cols) Compute across columns in
w
ww
Logical and boolean operators to use with filter() row-wise data.
== < <= is.na() %in% | xor() transmute(rowwise(UKgas), total = sum(c_across(1:2)))
w
www
ww mtcars %>% != > >= !is.na() ! &
w
group_by(cyl) %>% MAKE NEW VARIABLES
summarise(avg = mean(mpg)) See ?base::Logic and ?Comparison for help.
Apply vectorized functions to columns. Vectorized functions take
vectors as input and return vectors of the same length as output
ARRANGE CASES (see back).
Use rowwise(.data, …) to group data into individual rows. dplyr vectorized function
arrange(.data, …, .by_group = FALSE) Order
functions will compute results for each row. Also apply functions
w
www
ww
rows by values of a column or columns (low to
to list-columns. See tidyr cheat sheet for list-column workflow. high), use with desc() to order from high to low. mutate(.data, …, .keep = "all", .before = NULL,
w
www
ww
arrange(mtcars, mpg) .a er = NULL) Compute new column(s). Also
starwars %>% arrange(mtcars, desc(mpg)) add_column(), add_count(), and add_tally().
ww
www
ww
mutate(mtcars, gpm = 1 / mpg)
w
w
rowwise() %>%
mutate(film_count = length(films))
ADD CASES transmute(.data, …) Compute new column(s),
w
www
ww
Add one or more rows to a table.
ungroup(g_mtcars) add_row(cars, speed = 1, dist = 1) rename(.data, …) Rename columns. Use
w
wwww rename_with() to rename with a function.
rename(cars, distance = dist)
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
ft
ft
ft
ft
Vectorized Functions Summary Functions Combine Tables
TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES
mutate() and transmute() apply vectorized summarise() applies summary functions to x y
functions to columns to create new columns. columns to create a new table. Summary A B C E F G A B C E F G A B C
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
ft
ft
ft
ft
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ft
ff