0% found this document useful (0 votes)

3 views

EDP-3[1]

The document provides a comprehensive guide on merging and concatenating data frames in Pandas, including inner, left, right, and outer joins, as well as merging on multiple columns and different column names. It also covers concatenating data frames both vertically and horizontally, handling different indexes, and creating multi-indexes. Additionally, it discusses reshaping data using pivoting and melting techniques, along with handling missing data in pivot tables.

Uploaded by

ys304123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

EDP-3[1]

Uploaded by

ys304123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Merging database-style data frames

Inner Join (Default Merge)

import pandas as pd

# Creating two sample data frames

df1 = pd.DataFrame({

'ID': [1, 2, 3, 4],

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 35, 40]

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Salary': [50000, 60000, 70000]

})

# Merging the data frames on the 'ID' column

merged_df = pd.merge(df1, df2, on='ID')

print(merged_df)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25 50000

1 2 Bob 30 60000

2 3 Charlie 35 70000

Left Join
# Merging the data frames with a left join

merged_df_left = pd.merge(df1, df2, on='ID', how='left')

print(merged_df_left)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice 25 50000.0

1 2 Bob 30 60000.0

2 3 Charlie 35 70000.0

3 4 David 40 NaN

Right Join
# Merging the data frames with a right join

merged_df_right = pd.merge(df1, df2, on='ID', how='right')

print(merged_df_right)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25.0 50000

1 2 Bob 30.0 60000

2 3 Charlie 35.0 70000

Outer Join

# Merging the data frames with an outer join

merged_df_outer = pd.merge(df1, df2, on='ID', how='outer')

print(merged_df_outer)

Output:

pgsql

Copy
ID Name Age Salary

0 1 Alice 25.0 50000.0

1 2 Bob 30.0 60000.0

2 3 Charlie 35.0 70000.0

3 4 David 40.0 NaN

Merging on Multiple Columns

# Creating two data frames with multiple common columns

df1 = pd.DataFrame({

'ID': [1, 2, 3],

'Department': ['HR', 'Finance', 'IT'],

'Employee': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Department': ['HR', 'Finance', 'IT'],

'Salary': [50000, 60000, 70000]

})

# Merging based on both 'ID' and 'Department'

merged_df_multi = pd.merge(df1, df2, on=['ID', 'Department'])

print(merged_df_multi)

Output:

nginx

Copy

ID Department Employee Salary

0 1 HR Alice 50000

1 2 Finance Bob 60000

2 3 IT Charlie 70000
Merging with Different Column Names

# Creating data frames with different column names for the merge

df1 = pd.DataFrame({

'EmployeeID': [1, 2, 3],

'EmployeeName': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [1, 2, 3],

'Salary': [50000, 60000, 70000]

})

# Merging based on different column names

merged_df_diff_names = pd.merge(df1, df2, left_on='EmployeeID', right_on='ID')

print(merged_df_diff_names)

Output:

nginx

Copy

EmployeeID EmployeeName ID Salary

0 1 Alice 1 50000

1 2 Bob 2 60000

2 3 Charlie 3 70000

Concatenating along with an axis

Concatenating Vertically (Row-wise)

import pandas as pd

# Creating two data frames

df1 = pd.DataFrame({
'ID': [1, 2, 3],

'Name': ['Alice', 'Bob', 'Charlie']

})

df2 = pd.DataFrame({

'ID': [4, 5, 6],

'Name': ['David', 'Eve', 'Frank']

})

# Concatenating along axis=0 (row-wise)

concatenated_df = pd.concat([df1, df2], axis=0, ignore_index=True)

print(concatenated_df)

Output:

nginx

Copy

ID Name

0 1 Alice

1 2 Bob

2 3 Charlie

3 4 David

4 5 Eve

5 6 Frank

Concatenating Horizontally (Column-wise)

# Creating two data frames with the same index but different columns

df1 = pd.DataFrame({

'ID': [1, 2, 3],

'Name': ['Alice', 'Bob', 'Charlie']

})
df2 = pd.DataFrame({

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

})

# Concatenating along axis=1 (column-wise)

concatenated_df_horizontal = pd.concat([df1, df2], axis=1)

print(concatenated_df_horizontal)

Output:

nginx

Copy

ID Name Age Salary

0 1 Alice 25 50000

1 2 Bob 30 60000

2 3 Charlie 35 70000

Concatenating with Different Indexes

# Creating two data frames with different indexes

df1 = pd.DataFrame({

'ID': [1, 2],

'Name': ['Alice', 'Bob']

}, index=[0, 1])

df2 = pd.DataFrame({

'Age': [25, 30],

'Salary': [50000, 60000]

}, index=[1, 2])

# Concatenating along axis=0 (row-wise), handling different indexes

concatenated_df_diff_index = pd.concat([df1, df2], axis=0, ignore_index=True)

print(concatenated_df_diff_index)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

2 1 NaN 30.0 60000.0

3 2 NaN NaN NaN

Concatenating with Keys (Creating a MultiIndex)

# Concatenating with keys to create a hierarchical index

concatenated_df_keys = pd.concat([df1, df2], axis=0, keys=['df1', 'df2'])

print(concatenated_df_keys)

Output:

pgsql

Copy

ID Name Age Salary

df1 0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

df2 1 1 NaN 30.0 60000.0

2 2 NaN NaN NaN

Concatenating with Mismatched Columns

# Creating two data frames with mismatched columns

df1 = pd.DataFrame({

'ID': [1, 2],

'Name': ['Alice', 'Bob']

})

df2 = pd.DataFrame({

'Age': [25, 30],

'Salary': [50000, 60000]

})

# Concatenating along axis=1 (column-wise) with mismatched columns

concatenated_df_mismatched = pd.concat([df1, df2], axis=1)

print(concatenated_df_mismatched)

Output:

pgsql

Copy

ID Name Age Salary

0 1 Alice NaN NaN

1 2 Bob 25.0 50000.0

Merging on index
Simple Merge on Index

import pandas as pd

# Creating two data frames with meaningful indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT']

}, index=['a', 'b', 'c'])

# Merging the data frames on the index

merged_df = pd.merge(df1, df2, left_index=True, right_index=True)

print(merged_df)

Output:

css

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Merge on Index with Different Column Names

# Creating two data frames with different column names but same index

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT']

}, index=['a', 'b', 'c'])

# Merging the data frames on index

merged_df_diff_columns = pd.merge(df1, df2, left_index=True, right_index=True)

print(merged_df_diff_columns)

Output:

css
Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Merge on Index with how Parameter

# Creating two data frames with different indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000],

'Department': ['HR', 'Finance']

}, index=['a', 'b'])

# Merging with 'left' join on the index

merged_left = pd.merge(df1, df2, left_index=True, right_index=True, how='left')

print(merged_left)

Output:

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 NaN NaN

Merge with outer Join on Index

# Merging with an outer join on the index

merged_outer = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')

print(merged_outer)

Output:

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 NaN NaN

Merge on Index and Column (Multi-key Merge)

# Creating two data frames with different columns and indexes

df1 = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({

'Salary': [50000, 60000, 70000],

'Department': ['HR', 'Finance', 'IT'],

'Age': [25, 30, 35]

}, index=['a', 'b', 'c'])

# Merging on both index and a column

merged_df = pd.merge(df1, df2, left_index=True, right_index=True, on='Age')

print(merged_df)

Output:
css

Copy

Name Age Salary Department

a Alice 25 50000 HR

b Bob 30 60000 Finance

c Charlie 35 70000 IT

Reshaping and pivoting

Pivoting Data to Wide Format

import pandas as pd

# Creating a sample data frame

df = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],

'Temperature': [58, 70, 60, 72]

})

# Pivoting the data: Dates as rows, cities as columns, and Temperature as values

pivoted_df = df.pivot(index='Date', columns='City', values='Temperature')

print(pivoted_df)

Output:

sql

Copy

City Los Angeles New York

Date

2025-03-01 70 58

2025-03-02 72 60

Melting DataFrames with melt()

# Creating a wide-format data frame

df_wide = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-02'],

'New York': [58, 60],

'Los Angeles': [70, 72]

})

# Melting the data: Convert cities from columns to a single "City" column

melted_df = pd.melt(df_wide, id_vars=['Date'], var_name='City', value_name='Temperature')

print(melted_df)

Output:

yaml

Copy

Date City Temperature

0 2025-03-01 New York 58

1 2025-03-02 New York 60

2 2025-03-01 Los Angeles 70

3 2025-03-02 Los Angeles 72

Stacking DataFrame

# Creating a sample DataFrame

df = pd.DataFrame({

'City': ['New York', 'Los Angeles', 'Chicago'],

'Population': [8175133, 3792621, 2695598],

'Area': [789, 503, 589]

})

# Setting 'City' as the index

df.set_index('City', inplace=True)
# Stacking the DataFrame: Converts columns into a MultiIndex (rows)

stacked_df = df.stack()

print(stacked_df)

Output:

mathematica

Copy

City

New York Population 8175133

Area 789

Los Angeles Population 3792621

Area 503

Chicago Population 2695598

Area 589

dtype: int64

 stack(): Converts columns into rows, resulting in a hierarchical index.

Unstacking DataFrame

# Unstacking the stacked data: Converts rows back to columns

unstacked_df = stacked_df.unstack()

print(unstacked_df)

Output:

sql

Copy

City Population Area

New York 8175133 789

Los Angeles 3792621 503

Chicago 2695598 589

Reshaping with pivot_table()

# Creating a sample DataFrame

df = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],

'Temperature': [58, 70, 60, 72],

'Humidity': [60, 50, 65, 55]

})

# Using pivot_table to reshape and calculate the average Temperature per Date and City

pivot_table_df = df.pivot_table(index='Date', columns='City', values='Temperature',

aggfunc='mean')

print(pivot_table_df)

Output:

sql

Copy

City Los Angeles New York

Date

2025-03-01 70 58

2025-03-02 72 60

Handling Missing Data with pivot_table()

# Creating a sample DataFrame with missing values

df_with_missing = pd.DataFrame({

'Date': ['2025-03-01', '2025-03-01', '2025-03-02'],

'City': ['New York', 'Los Angeles', 'New York'],

'Temperature': [58, 70, None],

'Humidity': [60, 50, 65]

})

# Pivoting with missing data and using mean as aggregation function

pivot_table_missing = df_with_missing.pivot_table(index='Date', columns='City',

values='Temperature', aggfunc='mean')

print(pivot_table_missing)

Output:

pgsql

Copy

City Los Angeles New York

Date

2025-03-01 70.0 58.0

2025-03-02 NaN NaN

LCM
No ratings yet
LCM
34 pages
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
No ratings yet
BSC (Hons) Business Management Bmp4005 Information Systems and Big Data Analysis Assessment Number 2 Written Report and Poster Accompanying Paper
8 pages
Abinitio Material
No ratings yet
Abinitio Material
11 pages
Dsp Unit-5 Updated
No ratings yet
Dsp Unit-5 Updated
23 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
EXP-3
No ratings yet
EXP-3
10 pages
EXP-6
No ratings yet
EXP-6
9 pages
Python - Pandas Merging, Joining, and Concatenating
No ratings yet
Python - Pandas Merging, Joining, and Concatenating
1 page
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Merge, Join, and Concatenate: Concatenating Objects
No ratings yet
Merge, Join, and Concatenate: Concatenating Objects
62 pages
UnitIV.1
No ratings yet
UnitIV.1
4 pages
python 2.1.3 (2)
No ratings yet
python 2.1.3 (2)
6 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Lecture 8 - Data Wrangling Using Pandas
No ratings yet
Lecture 8 - Data Wrangling Using Pandas
31 pages
Concat, Join, Merge in Pandas
No ratings yet
Concat, Join, Merge in Pandas
17 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
Combining Datasets
No ratings yet
Combining Datasets
36 pages
Pandas
No ratings yet
Pandas
44 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
week2
No ratings yet
week2
6 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Unit_4_1
No ratings yet
Unit_4_1
3 pages
panda.ipynb - Colab
No ratings yet
panda.ipynb - Colab
1 page
Pandas
No ratings yet
Pandas
94 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Ch-2 - Panda - Part-1 - 2nd - Day
No ratings yet
Ch-2 - Panda - Part-1 - 2nd - Day
4 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
pandas_merged
No ratings yet
pandas_merged
2 pages
Merge, Join, and Concatenate - Pandas 0203 Documentation
No ratings yet
Merge, Join, and Concatenate - Pandas 0203 Documentation
31 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
a5
No ratings yet
a5
28 pages
Notes For Python Part III
No ratings yet
Notes For Python Part III
44 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
EDA (2)
No ratings yet
EDA (2)
7 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
Praveen PPT
No ratings yet
Praveen PPT
9 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
GR12 RECORD PROGRAMS 6TH ONWARDS
No ratings yet
GR12 RECORD PROGRAMS 6TH ONWARDS
18 pages
Wrangling 1
No ratings yet
Wrangling 1
5 pages
python interviews
No ratings yet
python interviews
154 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
64[6]
No ratings yet
64[6]
5 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
9.9.24 Revision
No ratings yet
9.9.24 Revision
9 pages
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Pandas Cheat Sheet (1)
No ratings yet
Pandas Cheat Sheet (1)
3 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
No Ph.D. Game Design With Three.js
From Everand
No Ph.D. Game Design With Three.js
Nikiforos Kontopoulos
No ratings yet
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
From Everand
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
Georgio Daccache
No ratings yet
VARSHA .... INTERNSHIP
No ratings yet
VARSHA .... INTERNSHIP
10 pages
Class Xii Patfil Cs Project Final
No ratings yet
Class Xii Patfil Cs Project Final
81 pages
Oracle RAC Tuning Tips
No ratings yet
Oracle RAC Tuning Tips
12 pages
Reading Sample Sap Press Abap Development For Sap Hana
No ratings yet
Reading Sample Sap Press Abap Development For Sap Hana
34 pages
Functional Depencies
No ratings yet
Functional Depencies
66 pages
Grade 9 Notes Printed - 05 - 2010 - Storage Devices
No ratings yet
Grade 9 Notes Printed - 05 - 2010 - Storage Devices
4 pages
GST Tax Interface Phase2
100% (1)
GST Tax Interface Phase2
23 pages
Google Cloud Notes
No ratings yet
Google Cloud Notes
7 pages
TBMR 722 UserGuide
No ratings yet
TBMR 722 UserGuide
75 pages
Informatica Training in KPHB, Kukatpally, Hyderabad
No ratings yet
Informatica Training in KPHB, Kukatpally, Hyderabad
2 pages
Advances and Issues in Frequent Pattern Mining
No ratings yet
Advances and Issues in Frequent Pattern Mining
21 pages
MySQL Replication
No ratings yet
MySQL Replication
4 pages
FYI Notification and Add Post Approval Action Type
No ratings yet
FYI Notification and Add Post Approval Action Type
14 pages
Raw - Vs - Filesystem ASE
No ratings yet
Raw - Vs - Filesystem ASE
6 pages
Database Management System (DBMS)
No ratings yet
Database Management System (DBMS)
9 pages
Lib Burst Generated
No ratings yet
Lib Burst Generated
7 pages
HZ Table Details
No ratings yet
HZ Table Details
126 pages
Bahria University,: Karachi Campus
0% (1)
Bahria University,: Karachi Campus
9 pages
Access 1-5 Create Tables
No ratings yet
Access 1-5 Create Tables
10 pages
PDF Nick Johnston Wide Eyes in The Dark DL
100% (1)
PDF Nick Johnston Wide Eyes in The Dark DL
196 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
Electrical Part List
No ratings yet
Electrical Part List
16 pages
Data Transcript Example For MSC Server R13.2
67% (3)
Data Transcript Example For MSC Server R13.2
783 pages
70-534 AzureReady - 15thApril2016
No ratings yet
70-534 AzureReady - 15thApril2016
41 pages
1202990.an Overview of Current Data Lake Architecture Models
No ratings yet
1202990.an Overview of Current Data Lake Architecture Models
6 pages
Windows File Management: Learning Outcomes Words To Know
No ratings yet
Windows File Management: Learning Outcomes Words To Know
5 pages
SMEP_Chat
No ratings yet
SMEP_Chat
3 pages