0% found this document useful (0 votes)
89 views

Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries

sjmisc is an R package that complements dplyr and helps with data transformation tasks and recoding variables. It provides functions for recoding and transforming variables, summarizing variables and cases, and descriptive statistics. The functions are designed to work seamlessly with dplyr and pipes. They follow tidyverse principles by making the data argument first and returning an object of the same type.

Uploaded by

ayrusurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries

sjmisc is an R package that complements dplyr and helps with data transformation tasks and recoding variables. It provides functions for recoding and transforming variables, summarizing variables and cases, and descriptive statistics. The functions are designed to work seamlessly with dplyr and pipes. They follow tidyverse principles by making the data argument first and returning an object of the same type.

Uploaded by

ayrusurya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data & Variable Descriptives and Summaries Recode and Transform Variables Summarise Variables and Cases

Transformation Most of the sjmisc functions (including recode- Recode functions add a suffix to new variables, The summary functions
with sjmisc Cheat Sheet functions) also work on grouped data frames: so original variables are preserved. mostly mimic base R
library(dplyr) By default, original input data frame and new equivalents, but are de-
efc %>% created variables are returned. Use append = signed to work together
group_by(e16sex, c172code) %>% FALSE to return the recoded variables only. with pipes and dplyr.
sjmisc complements dplyr, and helps with data
transformation tasks and recoding variables. frq(e42dep)
rec(x, ..., rec, as.num = TRUE, var.label = row_sums(x, ..., na.rm = TRUE, var =
sjmisc works together "rowsums", append = FALSE)
seamlessly with dplyr Frequency Tables NULL, val.labels = NULL, append = TRUE,
and pipes. All func- suffix = "_r") Row sums of data frames.
tions are designed to row_sums(efc, c82cop1:c90cop9)
frq(x, ..., sort.frq = c("none", "asc", "desc"), Recode values, return result as numeric,
support labelled data. weight.by = NULL, auto.grp ) character or categorical (factor).
Print frequency tables of (labelled) vectors. Uses rec(mtcars, carb, rec = "1,2=1; 3,4=2; else=3") row_means(x, ..., n, var = "rowmeans",
Design Philosophy variable labels as table header. append = FALSE)
data(efc); frq(efc, e42dep, c161sex) dicho(x, ..., dich.by = "median", as.num = Row means, for at least n valid (non-NA) values.
The design of sjmisc functions follows the FALSE, var.label = NULL, val.labels = NULL, row_means(efc, c82cop1:c90cop9, n = 7)
tidyverse-approach: first argument is always the Use this data set append = TRUE, suffix = "_d")
data (either a data frame or vector), followed by in examples!
variable names to be processed by the functions. Dichotomise variable by median, mean or row_count(x, ..., count, var = "rowcount",
specific value. append = FALSE)
flat_table(data, ..., margin = c("counts",
The returned object for each function equals the dicho(mtcars, disp) Row-wise count # of values in data frames.
"cell", "row", "col"), digits = 2,
type of the data-argument. Also col_count().
show.values = FALSE)
split_var(x, ..., n, as.num = FALSE, row_count(efc, c82cop1:c90cop9, count = 2)
Vector input Print contingency tables of (labelled) vectors.
• If the data-argument is a vector, functions Uses value labels. val.labels = NULL, var.label = NULL,
return a vector. flat_table(efc, e42dep, c172code, e16sex) inclusive = FALSE, append = TRUE, Other Useful Functions
suffix = "_g")
Split variable into equal sized groups. Unlike add_columns() and replace_columns() to
count_na(x, ...) dplyr::ntile(), does not split original categories combine data frames, but either replace or
rec(mtcars$carb, rec = "1,2=1; 3,4=2; else=3")
Print frequency table of tagged NA values. into different values (see examples in ?split_var). preserve existing columns.
library(haven); x <- labelled(c(1:3, split_var(mtcars, mpg, disp, n = 3) set_na() and replace_na() to convert regular
Data frame input tagged_na("a", "a", "z")), labels = into missing values, or vice versa. replace_na()
• If the data-argument is a data frame, functions c("Refused" = tagged_na("a"), "N/A" = also replaces specific tagged NA values only.
return a data frame. tagged_na("z"))) group_var(x, ..., size = 5, as.num = TRUE,
count_na(x) right.interval = FALSE, n = 30, append = remove_var() and var_rename() to remove
TRUE, suffix = "_gr") variables from data frames, or rename variables.
Split variable into groups with equal value range, group_str() to group similar string values. Useful
Descriptive Summary or into a max. # of groups (value range per group for variables with similar, but not identically
is adjusted to match # of groups).
rec(mtcars, carb, rec = "1,2=1; 3,4=2; else=3") descr(x, ..., max.length = NULL) group_var(mtcars, mpg, disp, size = 5) merge_df() to full join data frames and preserve
Descriptive summary of data frames, including group_var(mtcars, mpg, size = "auto", n = 4) value and variable labels.
variable labels in output. to_long() to gather multiple columns in data
-ellipses Argument descr(efc, contains("cop"), max.length = 20) frames from wide into long format.
std(x, ..., robust = "sd", include.fac = FALSE,
Apply functions to a single variable, selected
variables or to a complete data frame.
append = TRUE, suffix = "_z")
Finding Variables in a Data Frame Z-standardise variables. Also center(). Use with %>% and dplyr
Variable selection is powered by select():
Separate variables with comma, or use Use find_var() to search for variables by names, std(efc, e17age, c160age) # use sjmisc-functions in pipes
select-helpers to select variables, e.g. ?rec: value or variable labels. Returns vector/data mtcars %>% select(gear, carb) %>%
frame. rec(rec = "min:3=1; 4:max=2")
recode_to(x, ..., lowest = 0, highest = -1,
rec(mtcars, one_of(c("gear", "carb")), # use sjmisc-function inside mutate
append = TRUE, suffix = "_r0)
rec = "min:3=1; 4:max=2") find_var(efc, pattern = "cop", out = "df" ) mtcars %>% select(gear, carb) %>% mutate(
rec(mtcars, gear, carb, rec = "min:3=1; 4:max=2") # variables with "level" in names and value labels recode_to(mtcars$gear)
find_var(efc, "level", search = "name_value")
CC BY Daniel Lüdecke [email protected] github.com/strengejacke Learn more with browseVignettes("sjmisc") sjmisc 2.7.0 02/18

You might also like