0% found this document useful (0 votes)
11 views

Advanced R Programming Tidyverse Notes

The document outlines the process of data wrangling, including steps such as discovering, structuring, cleaning, enriching, and validating data using the tidyverse package in R. It provides examples of filtering, selecting, and summarizing data from the 'diamonds' dataset, demonstrating various techniques for data manipulation and analysis. Key operations include filtering by cut and price, selecting specific columns, reordering, and summarizing data based on different criteria.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Advanced R Programming Tidyverse Notes

The document outlines the process of data wrangling, including steps such as discovering, structuring, cleaning, enriching, and validating data using the tidyverse package in R. It provides examples of filtering, selecting, and summarizing data from the 'diamonds' dataset, demonstrating various techniques for data manipulation and analysis. Key operations include filtering by cut and price, selecting specific columns, reordering, and summarizing data based on different criteria.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Wrangling /Munging

# Data Wrangaling
1)Discovering
2)Structuring
3)Cleaning
4)Enriching
5)Validation
##Data wrangling with tidyverse package
library(tidyverse)
diamonds
View(diamonds)
#Filter subdataset
diamond_sm<-filter(diamonds,cut=="Ideal")
diamond_sm
View(diamond_sm)

diamonds_sm<-filter(diamonds,cut=="Ideal",price>10000)
diamonds_sm
View(diamonds_sm)
#Filter for missing values
print(is.na(diamonds_sm))
#subset by column
diamonds_sm<-data.frame(diamonds$cut,diamonds$color)
diamonds_sm
diamonds_sm<-select(diamonds,1:4)
View(diamonds_sm)

diamonds_c<-select(diamonds,contains("c"))
diamonds_c
View(diamonds_c)

diamonds_E<-select(diamonds,price,table,depth,everything())
View(diamonds_E)

diamonds_N<-select(diamonds,-c(price,depth,table))
diamonds_N

diamonds_sm<-diamonds %>% select(-price)


diamonds_sm

#reorder column
diamonds_arr<-diamonds %>% arrange(color,carat)
diamonds_arr
View(diamonds_arr)
#arrange in descending order
diamonds_arr<-diamonds %>% arrange(desc(carat))
View(diamonds_arr)
#add or modify columns
diamonds_new<-diamonds %>%
mutate(mass_g=0.02*carat,price_per_carat=price/carat,
cut=tolower(cut),
expensive=price>10000)
diamonds_new
View(diamonds_new)

#summarize the data


diamonds %>% group_by(cut) %>% summarize(mean(price))

diamonds %>% group_by(cut,color) %>% summarize(avg_price=mean(price),


sd_price=sd(price),
count=n())

diamonds %>% count(cut,color,clarity)

#summarize the data on the basis of expensive and nonexpensive

diamonds %>% group_by(price>10000) %>


%summarize(avg_price=mean(price),
sd_price=sd(price),
count=n())

You might also like