Recasting and Joining of Dataframes: Nptel Noc18-Cs28
Recasting and Joining of Dataframes: Nptel Noc18-Cs28
In this lecture
Recasting
Need to recast dataframes
Recast in 2 steps
◦ Melt
◦ Cast
Recast in 1 step –recast
Joining of two dataframes
◦ Left join, Right join, Inner join
Recasting dataframes
Dataframe – “pd”
Recasting is the process of
manipulating a data frame in
terms of its variables
insights
Console Output
Code
# Data frame example 2
pd=data.frame("Name"=c("Senthil","
Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
print(pd)
Recasting and combining dataframes NPTEL NOC18-CS28 4
Data science for Engineers
Melt
Cast
Step 1: Melt
Call the library ‘reshape2’ using the library() command
melt (data, id.vars, measure.vars, variable.name = "variable", value.name = "value")
library(reshape2)
print(Df)
Step 1: melt
Melt
Identifier measurement
variables variables
Step 2: cast
• Applying the dcast() function
• dcast (data, formula, value.var = col. with values)
Step 2: cast
Df2 = dcast(Df, variable+month ~ Name, value.var=“value” )
Cast
Melt
Identifier measurement
variables variables
Cast
left_join() full_join()
right_join() semi_join()
inner_join() anti_join()
Console Output
Code
# Data frame example 2
pd=data.frame("Name"=c("Senthil","
Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
print(pd)
Recasting and combining dataframes NPTEL NOC18-CS28 16
Data science for Engineers
Console Output
Code
left_join()
• joins matching rows of “dataframe2 ” to “dataframe1” based on the
“id.variable”
• In the example, only “Sam” and “Senthil” from id.variable “Name” are
present in “pd” which is dataframe1.
• Only these two IDs & corresponding values in “pd_new” will be merged
with “pd
• The variable “Department” from “pd_new” would be merged to its ‘left’
to pd
left_ join()
dataframe1 : pd
USE DATAFRAMES ‘pd’ and pd_new
Code
#using left_join()
dataframe2 : pd_new
#to combine two dataframes
#Continue from
#example
library(dplyr)
pd_left_join1 <- left_join(pd, pd_new, by pd_left_join1
="Name")
print(pd_left_join1)
right_join()
Joins matching rows of “dataframe1 ” to “dataframe2” based on the “id.variable”
Code dataframe1 : pd
#using right_join() #using
right_join()
#to combine two data frames
#Continue from dataframe2 : pd_new
#example
pd_right_join1 <- right_join
(pd, pd_new, by ="Name")
print(pd_right_join1) pd_right_join1
right_join()
Joins matching rows of “dataframe1 ” to “dataframe2” based
on the “id.variable”
Code dataframe1 : pd_new
#using right_join() #using
right_join()
#to combine two data frames
#Continue from dataframe2 : pd
#example
pd_right_join2 <- right_join
(pd_new, pd,
by ="Name") pd_right_join2
print(pd_right_join2)
inner_join()
Merges and retains those rows with IDs present in both dataframes
#using inner_join()
#to combine two data frames
#Continue from
dataframe2 : pd
#example
library(dplyr)
pd_inner_join1 <- inner_join
(pd_new, pd, by ="Name")
pd_inner_join1
print(pd_inner_join1)
left_join()
right_join() full_join()
inner_join() semi_join()
anti_join()