2.1 Combining Data Frames
2.1 Combining Data Frames
execs_df: Need to
filter missing
Need to data
clean name
Road Map
• Column-wise operations
• Filtering rows
• Joining or combining tables
Columnwise Operations
Data Cleaning
0 Julie_Sweet
# Single brackets: 1 column as a Series 1 Kumar_Birla
exec_df['name'] 2 Shantanu_Narayen
3 Garo_H._Armen
Computing Over a Series with a Function
It’s best NOT to iterate over the elements and modify them
• Instead: call apply with a function!
It’s best NOT to iterate over the elements and modify them
• Instead: call apply with a function!
0 Julie Sweet
1 Kumar Birla
2 Shantanu Narayen
# *apply* to each element, returning a new Series
exec_df['name'].apply(lambda x: x.replace('_', ' '))
Inserting the Results Back into the Dataframe
# Let's clean the name!
exec_df['clean_name'] = exec_df['name'].apply
(lambda x: x.replace('_', ' '))
name page born clean_name
https://en.wikipedia.org/wiki
0 Julie_Sweet NaT Julie Sweet
/Julie_Sweet
https://en.wikipedia.org/wiki 1967-06-
1 Kumar_Birla Kumar Birla
/Kumar_Birla 14
https://en.wikipedia.org/wiki
2 Shantanu_Narayen NaT Shantanu Narayen
/Shantanu_Narayen
Can I Change Column Names?
exec_df.rename(columns={'name': 'old_name'})
Another Variation of the Same Idea
name clean_name
0 Julie_Sweet Julie Sweet
1 Kumar_Birla Kumar Birla
2 Meghan Meghan
The Story So Far
• We’ve seen how to extract single columns and subsets of columns via
projection
• The apply operation allows us to do computation over a field or the
contents of a row
• We can assign the results back to columns in a dataframe
company_ceos_df execs_df
Join Output
Let’s Try the Join Again As a Left Outerjoin with an Indicator
result_df[result_df['_merge'] != 'both']
Exact-Match (Inner) Joins vs Outerjoins
Outerjoin will include “partial” rows when one side (e.g., the left)
doesn’t have a match on the other side (e.g., the right)
Composition! Joining a Joined Result
name is ambiguous, so
we need to give a table
variable to company_data