Pandas Cheat Sheet
Pandas Cheat Sheet
F
&
Each variable is saved in its own column
e 2 6 9 12 df
= pd.DataFrame(
{"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}, index = pd.MultiIndex.from_tuples(
[('d',1),('d',2),('e',2)],
names=['n','v']))) Create DataFrame with a MultiIndex
Method Chaining
Most pandas methods return a DataFrame so that another pandas method can be
applied to the result. This improves readability of code. df = (pd.melt(df)
.rename(columns={
'variable' : 'var', 'value' : 'val'}) .query('val >= 200') )
df[['width','length','species']] df[df.Length > 7]
Extract rows that meet logical criteria. df.drop_duplicates()
Remove duplicate rows (only considers columns).
df.sample(frac=0.5)
Randomly select fraction of rows. df.sample(n=10)
Randomly select n rows. df.iloc[10:20]
Select rows by position.
Select multiple columns with specific names. df['width'] or df.width
Select single column with specific name. df.filter(regex='regex')
Select columns whose name matches regular expression regex.
df.head(n)
df.nlargest(n, 'value') Select first n rows.
Select and order top n entries. df.tail(n)
df.nsmallest(n, 'value') Select last n rows.
Select and order bottom n entries.
Logic in Python (and pandas)
< Less than != Not equal to
df.loc[:,'x2':'x4'] > Greater than df.column.isin(values) Group membership
Select all columns between x2 and x4 (inclusive).
== Equals pd.isnull(obj) Is NaN
df.iloc[:,[1,2,5]]
<= Less than or equals pd.notnull(obj) Is not NaN
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all
regex (Regular Expressions) Examples
'\.' Matches strings containing a period '.'
'Length$' Matches strings ending with word 'Length'
'^Sepal' Matches strings beginning with the word 'Sepal'
'^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
''^(?!Species$).*' Matches strings except the string 'Species'
Select columns in positions 1, 2 and 5 (first column is 0). df.loc[df['a'] > 10, ['a','c']]
Select rows meeting logical condition, and only the specific columns . http://pandas.pydata.org/
This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet
(https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants
Plotting
df.plot.hist()
Histogram for each column
adf bdf x1 x2 A 1 B 2 C 3 Standard Joins
x1 x2 x3 A 1 T B 2 F C 3 NaN
x1 x2 x3 A 1.0 T B 2.0 F D NaN T
x1 x2 x3 A 1 T B 2 F
x1 x2 x3 A 1 T B 2 F C 3 NaN D NaN T
x1 x3 A T B F D T
pd.merge(adf, bdf,
how='left', on='x1') Join matching rows from bdf to adf.
df.assign(Area=lambda df: df.Length*df.Height)
Compute and append one or more new columns.
pd.merge(adf, bdf, df['Volume'] = df.Length*df.Height*df.Depth
how='right', on='x1') Add single column.
Join matching rows from adf to bdf. pd.qcut(df.col, n, labels=False)
Bin column into n buckets. min()
Minimum value in each object. max()
pd.merge(adf, bdf,
how='inner', on='x1') Vector function
Join data. Retain only rows in both sets. Maximum value in each object. mean()
Mean value of each object. var()
Vector function
pd.merge(adf, bdf, pandas provides a large set of vector functions that operate on all
how='outer', on='x1') columns of a DataFrame or a single selected column (a pandas
Join data. Retain all values, all rows. Variance of each object. std()
Series). These functions produce vectors of values for each of the columns, or a single
Series for the individual Series. Examples: Standard deviation of each
Filtering Joins object.
x1 x2 A 1 B 2
x1 x2 C 3
shift(1)
Copy with values shifted by 1. rank(method='dense')
Ranks with no gaps. rank(method='min')
Ranks. Ties get min rank. rank(pct=True)
Ranks rescaled to interval [0, 1]. rank(method='first')
Ranks. Ties go to first value.
min(axis=1)
Element-wise min. abs()
Absolute value.
The examples below can also be applied to groups. In this case, the function is applied
on a per-group basis, and the returned vectors are of the length of the original
DataFrame.
Windows
df.expanding()
Return an Expanding object allowing summary functions to be applied cumulatively.
df.rolling(n)
Return a Rolling object allowing summary functions to be applied to windows of length
n.
max(axis=1)
Element-wise max. clip(lower=-10,upper=10) Trim values at input thresholds