Pandas Cheat Sheet

The document discusses pandas, a Python library for data analysis and manipulation. It provides a cheat sheet of pandas syntax and methods for working with DataFrames. Key points covered include: - Creating and manipulating DataFrames - Reshaping data through operations like melt, pivot, and concatenation - Filtering and subsetting DataFrames - Grouping and aggregating data - Handling missing data - Visualizing data through plotting methods

Uploaded by

shan halder

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

172 views

Pandas Cheat Sheet

Uploaded by

shan halder

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Wrangling

with pandas M A Cheat Sheet

http://pandas.pydata.org
Syntax – Creating DataFrames
Tidy Data – A foundation for wrangling in pandas
In a tidy data set:
FMA

&
Each variable is saved in its own column

Tidy data complements pandas’s vectorized

operations. pandas will
automatically preserve observations as you manipulate variables. No other
format works as intuitively with pandas.

Each observation is saved in its own row Reshaping

Data – Change the layout of a data set
abc
1 4 7 10
2 5 8 11
3 6 9 12 df
= pd.DataFrame(
{"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = [1, 2, 3]) Specify values for
each column.
df = pd.DataFrame(
[[4, 7, 10], [5, 8, 11], [6, 9, 12]], index=[1, 2, 3], columns=['a', 'b', 'c']) Specify values
for each row.
abc
nv
1 4 7 10
d 2 5 8 11

e 2 6 9 12 df
= pd.DataFrame(
{"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples(
[('d',1),('d',2),('e',2)],
names=['n','v']))) Create DataFrame with a MultiIndex

Method Chaining
Most pandas methods return a DataFrame so that another pandas method can be
applied to the result. This improves readability of code. df = (pd.melt(df)
.rename(columns={
'variable' : 'var', 'value' : 'val'}) .query('val >= 200') )
df[['width','length','species']] df[df.Length > 7]
Extract rows that meet logical criteria. df.drop_duplicates()
Remove duplicate rows (only considers columns).
df.sample(frac=0.5)
Randomly select fraction of rows. df.sample(n=10)
Randomly select n rows. df.iloc[10:20]
Select rows by position.
Select multiple columns with specific names. df['width'] or df.width
Select single column with specific name. df.filter(regex='regex')
Select columns whose name matches regular expression regex.
df.head(n)
df.nlargest(n, 'value') Select first n rows.
Select and order top n entries. df.tail(n)
df.nsmallest(n, 'value') Select last n rows.
Select and order bottom n entries.
Logic in Python (and pandas)
< Less than != Not equal to
df.loc[:,'x2':'x4'] > Greater than df.column.isin(values) Group membership
Select all columns between x2 and x4 (inclusive).
== Equals pd.isnull(obj) Is NaN
df.iloc[:,[1,2,5]]
<= Less than or equals pd.notnull(obj) Is not NaN
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all
regex (Regular Expressions) Examples
'\.' Matches strings containing a period '.'
'Length$' Matches strings ending with word 'Length'
'^Sepal' Matches strings beginning with the word 'Sepal'
'^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
''^(?!Species$).*' Matches strings except the string 'Species'
Select columns in positions 1, 2 and 5 (first column is 0). df.loc[df['a'] > 10, ['a','c']]
Select rows meeting logical condition, and only the specific columns . http://pandas.pydata.org/
This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet
(https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants

M A pd.melt(df) Gather columns into rows.

M
* AF
*
df.pivot(columns='var', values='val')

Spread rows into columns.

pd.concat([df1,df2])
Append rows of DataFrames
df.sort_values('mpg')
Order rows by values of a column (low to high).
df.sort_values('mpg',ascending=False) Order rows by values of a column (high to
low).
df.rename(columns = {'y':'year'})
Rename the columns of a DataFrame
df.sort_index()
Sort the index of a DataFrame
df.reset_index()
Reset index of DataFrame to row numbers, moving index to columns.
pd.concat([df1,df2], axis=1)
df.drop(columns=['Length','Height']) Append columns of DataFrames
Drop columns from DataFrame

Subset Observations (Rows)

Subset Variables (Columns)
http://pandas.pydata.org/ This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet
(https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants
Summarize Data
Make New Columns
Combine Data Sets
df['w'].value_counts()
Count number of rows with each unique value of variable len(df)
# of rows in DataFrame. df['w'].nunique()
# of distinct values in a column. df.describe()
Basic descriptive statistics for each column (or GroupBy)
pandas provides a large set of summary functions that operate on different kinds of
pandas objects (DataFrame columns, Series, GroupBy, Expanding and Rolling (see
below)) and produce single values for each of the groups. When applied to a
DataFrame, the result is returned as a pandas Series for each column. Examples:
sum()
Sum values of each object. count()
Count non-NA/null values of each object. median()
Median value of each object. quantile([0.25,0.75]) Quantiles of each object.
apply(function)
Apply function to each object.

Handling Missing Data

df.dropna()
Drop rows with any column having NA/null data. df.fillna(value)
Replace all NA/null data with value.

Plotting
df.plot.hist()
Histogram for each column
adf bdf x1 x2 A 1 B 2 C 3 Standard Joins
x1 x2 x3 A 1 T B 2 F C 3 NaN
x1 x2 x3 A 1.0 T B 2.0 F D NaN T
x1 x2 x3 A 1 T B 2 F
x1 x2 x3 A 1 T B 2 F C 3 NaN D NaN T
x1 x3 A T B F D T
pd.merge(adf, bdf,
how='left', on='x1') Join matching rows from bdf to adf.
df.assign(Area=lambda df: df.Length*df.Height)
Compute and append one or more new columns.
pd.merge(adf, bdf, df['Volume'] = df.Length*df.Height*df.Depth
how='right', on='x1') Add single column.
Join matching rows from adf to bdf. pd.qcut(df.col, n, labels=False)
Bin column into n buckets. min()
Minimum value in each object. max()
pd.merge(adf, bdf,
how='inner', on='x1') Vector function
Join data. Retain only rows in both sets. Maximum value in each object. mean()
Mean value of each object. var()
Vector function
pd.merge(adf, bdf, pandas provides a large set of vector functions that operate on all
how='outer', on='x1') columns of a DataFrame or a single selected column (a pandas
Join data. Retain all values, all rows. Variance of each object. std()
Series). These functions produce vectors of values for each of the columns, or a single
Series for the individual Series. Examples: Standard deviation of each
Filtering Joins object.
x1 x2 A 1 B 2
x1 x2 C 3
shift(1)
Copy with values shifted by 1. rank(method='dense')
Ranks with no gaps. rank(method='min')
Ranks. Ties get min rank. rank(pct=True)
Ranks rescaled to interval [0, 1]. rank(method='first')
Ranks. Ties go to first value.
min(axis=1)
Element-wise min. abs()
Absolute value.
The examples below can also be applied to groups. In this case, the function is applied
on a per-group basis, and the returned vectors are of the length of the original
DataFrame.

Windows
df.expanding()
Return an Expanding object allowing summary functions to be applied cumulatively.
df.rolling(n)
Return a Rolling object allowing summary functions to be applied to windows of length
n.
max(axis=1)
Element-wise max. clip(lower=-10,upper=10) Trim values at input thresholds

adf[adf.x1.isin(bdf.x1)] Group Data

All rows in adf that have a match in bdf.
df.groupby(by="col")
adf[~adf.x1.isin(bdf.x1)] Return a GroupBy object,
All rows in adf that do not have a match in bdf. grouped by values in column named
"col".
df.groupby(level="ind")
Return a GroupBy object, grouped by values in index level named "ind".
x1 x2 A 1 B 2 C 3
All of the summary functions listed above can be applied to a group. Additional GroupBy
functions:
shift(-1)
ydf zdf Copy with values lagged by 1. cumsum()
Cumulative sum. cummax()
Cumulative max. cummin()
Cumulative min. cumprod()
Cumulative product.
Set-like Operations x1 x2 B 2 C 3
x1 x2 A 1 B 2 C 3 D 4
x1 x2 A 1
x1 x2 B 2 C 3 D 4
pd.merge(ydf, zdf) size()
agg(function)
Rows that appear in both ydf and zdf Size of each group.
Aggregate group using function.
(Intersection).
pd.merge(ydf, zdf, how='outer')
Rows that appear in either or both ydf and zdf
df.plot.scatter(x='w',y='h')
(Union).
Scatter chart using pairs of points
pd.merge(ydf, zdf, how='outer',
indicator=True) .query('_merge == "left_only"') .drop(columns=['_merge'])
Rows that appear in ydf but not zdf (Setdiff).

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
Change Management Plan
67% (3)
Change Management Plan
8 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Data Import
100% (1)
Python Data Import
28 pages
Q1 Answer 1: Module 6-Assignment - Power Bi
No ratings yet
Q1 Answer 1: Module 6-Assignment - Power Bi
5 pages
Export Data From Excel To Table Using Custom Web ADI Integrator
No ratings yet
Export Data From Excel To Table Using Custom Web ADI Integrator
17 pages
Data Backup Checklist
No ratings yet
Data Backup Checklist
2 pages
Innovation Secrets of Steve Jobs
No ratings yet
Innovation Secrets of Steve Jobs
26 pages
TensorFlow With R
No ratings yet
TensorFlow With R
46 pages
Lesson 5 Data Wrangling in Data Science.
100% (1)
Lesson 5 Data Wrangling in Data Science.
11 pages
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
No ratings yet
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
8 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Data Warehousing & Dimensional Modeling Concepts !!
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
33 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
SQL Server Interview Questions With Answers Set 2 40 Questionsanswers
No ratings yet
SQL Server Interview Questions With Answers Set 2 40 Questionsanswers
31 pages
Mongo DB Using Python
No ratings yet
Mongo DB Using Python
7 pages
Select Joins: SQL Cheat Sheet
100% (1)
Select Joins: SQL Cheat Sheet
3 pages
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
No ratings yet
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
6 pages
Introduction To SQL - NEW
No ratings yet
Introduction To SQL - NEW
27 pages
Preparing Data For Analysis Using Excel
No ratings yet
Preparing Data For Analysis Using Excel
10 pages
SQL Refresher Complete Notes PDF
No ratings yet
SQL Refresher Complete Notes PDF
352 pages
PySpark RDD Basics PDF
No ratings yet
PySpark RDD Basics PDF
1 page
PySpark SQL Cheat Sheet Python
No ratings yet
PySpark SQL Cheat Sheet Python
1 page
STAT 451: Intro To Machine Learning Lecture Notes
100% (1)
STAT 451: Intro To Machine Learning Lecture Notes
17 pages
SQL: Queries, Constraints, Triggers, Null: February 18, 2014
No ratings yet
SQL: Queries, Constraints, Triggers, Null: February 18, 2014
67 pages
CheatSheet Python 3 Complex Data Types
No ratings yet
CheatSheet Python 3 Complex Data Types
1 page
Python Basic and Advanced-Day 8
100% (1)
Python Basic and Advanced-Day 8
20 pages
Converting An E-R Diagram To A Relational Schema
No ratings yet
Converting An E-R Diagram To A Relational Schema
4 pages
WEEK 4 - What Is Common Table Expressions
No ratings yet
WEEK 4 - What Is Common Table Expressions
3 pages
Day65 - Day70 Power BI Interview
No ratings yet
Day65 - Day70 Power BI Interview
31 pages
Database: Note
No ratings yet
Database: Note
81 pages
Cognos Interview Questions
No ratings yet
Cognos Interview Questions
148 pages
DB Questions
No ratings yet
DB Questions
32 pages
Join in MySQL
100% (1)
Join in MySQL
6 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Multiple Questions On SQL
No ratings yet
Multiple Questions On SQL
7 pages
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
100% (1)
Create Int Varchar Date Varchar State Varchar: Emp - Piyush Employeeid Empname 30 Dob City 20 20
10 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
SQL JOIN Types Explained
No ratings yet
SQL JOIN Types Explained
8 pages
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
Bteq Fexp Fload Mload
No ratings yet
Bteq Fexp Fload Mload
59 pages
SQL SERVER - 2008 - Interview Questions and Answers - : Pinaldave
100% (2)
SQL SERVER - 2008 - Interview Questions and Answers - : Pinaldave
26 pages
SQL Information
No ratings yet
SQL Information
90 pages
Subqueries
No ratings yet
Subqueries
22 pages
20 SQL Queries For Interview - Complex SQL Queries For Interview
No ratings yet
20 SQL Queries For Interview - Complex SQL Queries For Interview
8 pages
MIcrosoft SQL Server 2012 - T-SQL
No ratings yet
MIcrosoft SQL Server 2012 - T-SQL
9 pages
OLTP
No ratings yet
OLTP
12 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Ms-Bi: Course Content
No ratings yet
Ms-Bi: Course Content
7 pages
Python Numpy Pandas Interview Questions
No ratings yet
Python Numpy Pandas Interview Questions
8 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
Teradata Advanced SQL Part1 PDF
100% (2)
Teradata Advanced SQL Part1 PDF
38 pages
ML Practical File
100% (2)
ML Practical File
43 pages
SQL Cheat Sheet - 1557131235
No ratings yet
SQL Cheat Sheet - 1557131235
12 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Adl SQL Cheatsheet
100% (1)
Adl SQL Cheatsheet
3 pages
PLSQL Intrv Guide
0% (1)
PLSQL Intrv Guide
159 pages
DWDM Single PPT Notes
No ratings yet
DWDM Single PPT Notes
169 pages
My Part-Time Study Notes on Mssql Server
From Everand
My Part-Time Study Notes on Mssql Server
Morris Sebenzile Mntoninzi
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Oracle Database 12c Complete Self-Assessment Guide
From Everand
Oracle Database 12c Complete Self-Assessment Guide
Gerardus Blokdyk
4/5 (1)
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
Deep Learning for Computer Vision with SAS: An Introduction
From Everand
Deep Learning for Computer Vision with SAS: An Introduction
Robert Blanchard
No ratings yet
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
No ratings yet
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
12 pages
Utilizing Axapta Application Layers
No ratings yet
Utilizing Axapta Application Layers
18 pages
KSBT Activity Prices of Cost Centers
No ratings yet
KSBT Activity Prices of Cost Centers
10 pages
Microcontroller Lab Viva Questions Answers
No ratings yet
Microcontroller Lab Viva Questions Answers
31 pages
SNMP Feature On Yealink IP Phones
No ratings yet
SNMP Feature On Yealink IP Phones
7 pages
BRKRST-2124-Introduction To Segment Routing PDF
No ratings yet
BRKRST-2124-Introduction To Segment Routing PDF
94 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
7 pages
Business Strategy Basics
No ratings yet
Business Strategy Basics
23 pages
Assignment v5.0 EN
No ratings yet
Assignment v5.0 EN
17 pages
05.Project-Student Admission System
100% (1)
05.Project-Student Admission System
8 pages
Application Brief Session Border Controller
No ratings yet
Application Brief Session Border Controller
4 pages
IT Sem 6 Syllabus
No ratings yet
IT Sem 6 Syllabus
13 pages
Different Types of Operators in C
100% (2)
Different Types of Operators in C
3 pages
1-Introduction To ODE
No ratings yet
1-Introduction To ODE
48 pages
Mcmurdo Prisma Rccnet Datasheet Us Letter Final 09-2015
No ratings yet
Mcmurdo Prisma Rccnet Datasheet Us Letter Final 09-2015
2 pages
Interview Questions Software Testing
No ratings yet
Interview Questions Software Testing
40 pages
Aiml Neural Net
No ratings yet
Aiml Neural Net
19 pages
50 Essential Business Hacks
100% (1)
50 Essential Business Hacks
22 pages
Viral Marketing in Social Network Using Data Mining: Shalini Sharma, Vishal Shrivastava
No ratings yet
Viral Marketing in Social Network Using Data Mining: Shalini Sharma, Vishal Shrivastava
5 pages
Thesis
No ratings yet
Thesis
3 pages
Computer Science P2 Programming Concepts Important Notes
No ratings yet
Computer Science P2 Programming Concepts Important Notes
18 pages
Ms Tower - Ind - 220v - Usa - v03
No ratings yet
Ms Tower - Ind - 220v - Usa - v03
9 pages
Avgrep
No ratings yet
Avgrep
2 pages
Neogeo Programmer Guide
No ratings yet
Neogeo Programmer Guide
220 pages
MCA211 Software Testing
No ratings yet
MCA211 Software Testing
2 pages
The Wizard of OKS - R12 Oracle Service Contracts Advanced Features
No ratings yet
The Wizard of OKS - R12 Oracle Service Contracts Advanced Features
10 pages