0% found this document useful (0 votes)
55 views

Recasting and Joining of Dataframes: Nptel Noc18-Cs28

This document discusses recasting and joining dataframes in R. It explains that recasting involves reshaping a dataframe by manipulating its variables. This can be done in two steps of melt and cast or in a single step using recast. Joining combines two dataframes based on a common identifier variable. Specific joining methods like left_join(), right_join() are demonstrated to combine two sample dataframes based on the name variable.

Uploaded by

Shashank Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Recasting and Joining of Dataframes: Nptel Noc18-Cs28

This document discusses recasting and joining dataframes in R. It explains that recasting involves reshaping a dataframe by manipulating its variables. This can be done in two steps of melt and cast or in a single step using recast. Joining combines two dataframes based on a common identifier variable. Specific joining methods like left_join(), right_join() are demonstrated to combine two sample dataframes based on the name variable.

Uploaded by

Shashank Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data science for Engineers

Recasting and joining of dataframes

Recasting and combining dataframes NPTEL NOC18-CS28 1


Data science for Engineers

In this lecture
 Recasting
 Need to recast dataframes
 Recast in 2 steps
◦ Melt
◦ Cast
 Recast in 1 step –recast
 Joining of two dataframes
◦ Left join, Right join, Inner join

Recasting and combining dataframes NPTEL NOC18-CS28 2


Data science for Engineers

Recasting dataframes
Dataframe – “pd”
 Recasting is the process of
manipulating a data frame in
terms of its variables

 Reshaping the data

 insights

Recasting and combining dataframes NPTEL NOC18-CS28 3


Data science for Engineers

Recast in two steps: Example


• Create the following example : dataframe ‘pd’

Console Output

Code
# Data frame example 2
pd=data.frame("Name"=c("Senthil","
Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
print(pd)
Recasting and combining dataframes NPTEL NOC18-CS28 4
Data science for Engineers

Recast in two steps: Example


 Two steps

 Melt

 Cast

 Identifier (Discrete type


variables)

 Measurements (numeric Identifier measurement


variables variables
variables)

 Categorical and Date variables


can not be measurements

Recasting and combining dataframes NPTEL NOC18-CS28 5


Data science for Engineers

Step 1: Melt
Call the library ‘reshape2’ using the library() command
melt (data, id.vars, measure.vars, variable.name = "variable", value.name = "value")

Code Console Output

# Data frame example 3

# melt operation sample code

library(reshape2)

Df = melt(pd, id.vars = c(“Name“,”Month”) ,

measure.vars = c(“BS", “BP") )

print(Df)

Recasting and combining dataframes NPTEL NOC18-CS28 6


Data science for Engineers

Step 1: melt

Melt

Identifier measurement
variables variables

Recasting and combining dataframes NPTEL NOC18-CS28 7


Data science for Engineers

Step 2: cast
• Applying the dcast() function
• dcast (data, formula, value.var = col. with values)

Code Console Output


# cast operation sample code
# continued from previous code
# we use dcast as we are working on
a dataframe
Df2 = dcast(Df,
variable+month ~ Name ,
value.var=“value“ ) Column of Df from which the
values are to be taken from
print(Df2)
Columns “variable” & “month” to remain as is.
Categories in column “Name” become new variables.

Recasting and combining dataframes NPTEL NOC18-CS28 8


Data science for Engineers

Step 2: cast
Df2 = dcast(Df, variable+month ~ Name, value.var=“value” )

Cast

Recasting and combining dataframes NPTEL NOC18-CS28 9


Data science for Engineers

Recasting in single step


• Applying the recast() function performs melt and cast in one command
• recast(data, formula, ..., id.var, measure.var)

Command & console Output


Parameter refers to the “cast” Parameter refers to the “melt”
section of the command section of the command

Recasting and combining dataframes NPTEL NOC18-CS28 10


Data science for Engineers

recast()-melt and cast together

Melt

Identifier measurement
variables variables

Cast

Recasting and combining dataframes NPTEL NOC18-CS28 11


Data science for Engineers

Add new variable to dataframe based on existing ones


• Call the library ‘dplyr’ command using the library() command
• mutate() command will add extra variable columns based on existing ones.

Code Console Output


# Adding new variables
#Continue from
#example on slide 3
library(dplyr)
pd2 <- mutate(pd, log_BP = log(BP))
print(pd2)

• original data frame ‘pd’ is the first argument


• multiple variables can be created as transformation of old variable
• here, new variable column is “log_BP” which is log of variable column
“BP”
Recasting and combining dataframes NPTEL NOC18-CS28 12
Data science for Engineers

Joining of two frames

Recasting and combining dataframes NPTEL NOC18-CS28 13


Data science for Engineers

Combining two dataframes – dplyr package


The common syntax for “dplyr” functions used to combine dataframes:
“function(dataframe1, dataframe2, by = id.variable)”
 The “id.variable” is common to both dataframes
 This variable provides the identifiers for combining the 2
dataframes
 The nature of combination depends on the function to be used
 Illustration Example : A possible combination
ID Name Age ID Gender ID Name Age Gender
1 Jack 10 + 2 Girl 1 Jack 10 Boy
2 Jill 12 1 Boy 2 Jill 12 Girl

id.variable “ID” is used to combine both dataframes column wise

Recasting and combining dataframes NPTEL NOC18-CS28 14


Data science for Engineers

Combining two dataframes

• Call the library ‘dplyr’ command using the library() command

• The following commands would be used to combine datasets:

left_join() full_join()

right_join() semi_join()

inner_join() anti_join()

Recasting and combining dataframes NPTEL NOC18-CS28 15


Data science for Engineers

Example: create first dataframe


Create the data frame ‘pd’

Console Output

Code
# Data frame example 2
pd=data.frame("Name"=c("Senthil","
Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
print(pd)
Recasting and combining dataframes NPTEL NOC18-CS28 16
Data science for Engineers

Create another dataframe


Create another data frame : ‘pd_new’

Console Output

Code

# Data frame example 3


pd_new=data.frame("Name"=c("Senthil",
“Ramesh", "Sam"),
“Department"=c(“PSE",“Data
Analytics",“PSE"))
print(pd_new)

Recasting and combining dataframes NPTEL NOC18-CS28 17


Data science for Engineers

left_join()
• joins matching rows of “dataframe2 ” to “dataframe1” based on the
“id.variable”
• In the example, only “Sam” and “Senthil” from id.variable “Name” are
present in “pd” which is dataframe1.
• Only these two IDs & corresponding values in “pd_new” will be merged
with “pd
• The variable “Department” from “pd_new” would be merged to its ‘left’
to pd

dataframe1 : pd dataframe2 : pd_new

Recasting and combining dataframes NPTEL NOC18-CS28 18


Data science for Engineers

left_ join()
dataframe1 : pd
USE DATAFRAMES ‘pd’ and pd_new

Code

#using left_join()
dataframe2 : pd_new
#to combine two dataframes
#Continue from
#example
library(dplyr)
pd_left_join1 <- left_join(pd, pd_new, by pd_left_join1
="Name")
print(pd_left_join1)

Recasting and combining dataframes NPTEL NOC18-CS28 19


Data science for Engineers

right_join()
Joins matching rows of “dataframe1 ” to “dataframe2” based on the “id.variable”

Code dataframe1 : pd
#using right_join() #using
right_join()
#to combine two data frames
#Continue from dataframe2 : pd_new
#example
pd_right_join1 <- right_join
(pd, pd_new, by ="Name")
print(pd_right_join1) pd_right_join1

Recasting and combining dataframes NPTEL NOC18-CS28 20


Data science for Engineers

right_join()
Joins matching rows of “dataframe1 ” to “dataframe2” based
on the “id.variable”
Code dataframe1 : pd_new
#using right_join() #using
right_join()
#to combine two data frames
#Continue from dataframe2 : pd
#example
pd_right_join2 <- right_join
(pd_new, pd,
by ="Name") pd_right_join2
print(pd_right_join2)

Recasting and combining dataframes NPTEL NOC18-CS28 21


Data science for Engineers

inner_join()
Merges and retains those rows with IDs present in both dataframes

Code dataframe1 : pd_now

#using inner_join()
#to combine two data frames
#Continue from
dataframe2 : pd
#example
library(dplyr)
pd_inner_join1 <- inner_join
(pd_new, pd, by ="Name")
pd_inner_join1
print(pd_inner_join1)

Recasting and combining dataframes NPTEL NOC18-CS28 22


Data science for Engineers

Combining two dataframes: summary

left_join()

right_join() full_join()

inner_join() semi_join()

anti_join()

Recasting and combining dataframes NPTEL NOC18-CS28 23

You might also like