Creating Dataframes Reshaping Data

Tidy data is a format where each variable is in its own column, each observation is in its own row, and each type of observational unit forms a table. Tidy data complements the vectorized operations in pandas by automatically preserving observations as variables are manipulated. No other data format works as intuitively with pandas as tidy data.

Uploaded by

nikolamil1993

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

133 views2 pages

Creating Dataframes Reshaping Data

Uploaded by

nikolamil1993

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Tidy Data – A foundation for wrangling in pandas

Data Wrangling
& *
Tidy data complements pandas’s vectorized
with pandas Cheat Sheet In a tidy operations. pandas will automatically preserve
http://pandas.pydata.org data set: observations as you manipulate variables. No
other format works as intuitively with pandas.
M A
Pandas API Reference Pandas User Guide Each variable is saved
in its own column
Each observation is
saved in its own row *
Creating DataFrames Reshaping Data – Change layout, sorting, reindexing, renaming
a b c df.sort_values('mpg')
1 4 7 10 Order rows by values of a column (low to high).
2 5 8 11
df.sort_values('mpg’, ascending=False)
3 6 9 12
Order rows by values of a column (high to low).
df = pd.DataFrame(
{"a" : [4, 5, 6], pd.melt(df) df.pivot(columns='var', values='val') df.rename(columns = {'y':'year'})
Gather columns into rows. Spread rows into columns. Rename the columns of a DataFrame
"b" : [7, 8, 9],
"c" : [10, 11, 12]}, df.sort_index()
index = [1, 2, 3]) Sort the index of a DataFrame
Specify values for each column.
df.reset_index()
df = pd.DataFrame( Reset index of DataFrame to row numbers, moving
[[4, 7, 10], index to columns.
[5, 8, 11], pd.concat([df1,df2], axis=1)
pd.concat([df1,df2]) df.drop(columns=['Length’, 'Height'])
[6, 9, 12]], Append columns of DataFrames
Append rows of DataFrames Drop columns from DataFrame
index=[1, 2, 3],
columns=['a', 'b', 'c'])
Specify values for each row. Subset Observations - rows Subset Variables - columns Subsets - rows and columns
a b c Use df.loc[] and df.iloc[] to select only
N v rows, only columns or both.
1 4 7 10 Use df.at[] and df.iat[] to access a single
D
2 5 8 11
df[df.Length > 7] df[['width’, 'length’, 'species']] value by row and column.
Extract rows that meet logical criteria. Select multiple columns with specific names. First index selects rows, second index columns.
e 2 6 9 12
df.drop_duplicates() df['width'] or df.width
df.iloc[10:20]
df = pd.DataFrame( Remove duplicate rows (only considers columns). Select single column with specific name.
Select rows 10-20.
{"a" : [4 ,5, 6], df.sample(frac=0.5) df.filter(regex='regex')
df.iloc[:, [1, 2, 5]]
"b" : [7, 8, 9], Randomly select fraction of rows. Select columns whose name matches
Select columns in positions 1, 2 and 5 (first
"c" : [10, 11, 12]}, df.sample(n=10) Randomly select n rows. regular expression regex.
column is 0).
index = pd.MultiIndex.from_tuples( df.nlargest(n, 'value’)
df.loc[:, 'x2':'x4']
[('d’, 1), ('d’, 2), Select and order top n entries. Using query Select all columns between x2 and x4 (inclusive).
('e’, 2)], names=['n’, 'v'])) df.nsmallest(n, 'value')
query() allows Boolean expressions for filtering df.loc[df['a'] > 10, ['a’, 'c']]
Create DataFrame with a MultiIndex Select and order bottom n entries.
rows. Select rows meeting logical condition, and only
df.head(n)
df.query('Length > 7') the specific columns .
Select first n rows.
Method Chaining df.tail(n)
Select last n rows.
df.query('Length > 7 and Width < 8')
df.query('Name.str.startswith("abc")',
df.iat[1, 2] Access single value by index
df.at[4, 'A'] Access single value by label
Most pandas methods return a DataFrame so that engine="python")
another pandas method can be applied to the result. Logic in Python (and pandas) regex (Regular Expressions) Examples
This improves readability of code.
< Less than != Not equal to '\.' Matches strings containing a period '.'
df = (pd.melt(df)
.rename(columns={ > Greater than df.column.isin(values) Group membership 'Length$' Matches strings ending with word 'Length'

'variable':'var', == Equals pd.isnull(obj) Is NaN '^Sepal' Matches strings beginning with the word 'Sepal'
'value':'val'}) <= Less than or equals pd.notnull(obj) Is not NaN '^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
.query('val >= 200')
>= Greater than or equals &,|,~,^,df.any(),df.all() Logical and, or, not, xor, any, all '^(?!Species$).*' Matches strings except the string 'Species'
)
Cheatsheet for pandas (http://pandas.pydata.org/ originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet
Summarize Data Handling Missing Data Combine Data Sets
df['w'].value_counts() df.dropna() adf bdf
Count number of rows with each unique value of variable Drop rows with any column having NA/null data. x1 x2 x1 x3
len(df) df.fillna(value) A 1 A T
# of rows in DataFrame. Replace all NA/null data with value. B 2 B F
df.shape C 3 D T
Tuple of # of rows, # of columns in DataFrame.
df['w'].nunique()
Make New Columns Standard Joins

# of distinct values in a column. x1 x2 x3 pd.merge(adf, bdf,

df.describe() A 1 T how='left', on='x1')
Basic descriptive and statistics for each column (or GroupBy). B 2 F Join matching rows from bdf to adf.
C 3 NaN
df.assign(Area=lambda df: df.Length*df.Height)
Compute and append one or more new columns. x1 x2 x3 pd.merge(adf, bdf,
df['Volume'] = df.Length*df.Height*df.Depth A 1.0 T how='right', on='x1')
pandas provides a large set of summary functions that operate on B 2.0 F
Add single column. Join matching rows from adf to bdf.
different kinds of pandas objects (DataFrame columns, Series, D NaN T
pd.qcut(df.col, n, labels=False)
GroupBy, Expanding and Rolling (see below)) and produce single
Bin column into n buckets.
values for each of the groups. When applied to a DataFrame, the x1 x2 x3 pd.merge(adf, bdf,
result is returned as a pandas Series for each column. Examples: A 1 T how='inner', on='x1')
sum() min() Vector Vector B 2 F Join data. Retain only rows in both sets.
function function
Sum values of each object. Minimum value in each object.
count() max() x1 x2 x3 pd.merge(adf, bdf,
Count non-NA/null values of Maximum value in each object. pandas provides a large set of vector functions that operate on all A 1 T how='outer', on='x1')
each object. mean() columns of a DataFrame or a single selected column (a pandas B 2 F Join data. Retain all values, all rows.
median() Mean value of each object. Series). These functions produce vectors of values for each of the C 3 NaN
Median value of each object. var() columns, or a single Series for the individual Series. Examples: D NaN T
quantile([0.25,0.75]) Variance of each object.
max(axis=1) min(axis=1) Filtering Joins
Quantiles of each object. std()
Element-wise max. Element-wise min. x1 x2 adf[adf.x1.isin(bdf.x1)]
apply(function) Standard deviation of each
clip(lower=-10,upper=10) abs() A 1 All rows in adf that have a match in bdf.
Apply function to each object. object.
Trim values at input thresholds Absolute value. B 2
Group Data x1 x2 adf[~adf.x1.isin(bdf.x1)]
The examples below can also be applied to groups. In this case, the C 3 All rows in adf that do not have a match in bdf.
df.groupby(by="col")
Return a GroupBy object, grouped function is applied on a per-group basis, and the returned vectors
by values in column named "col". are of the length of the original DataFrame. ydf zdf
shift(1) x1 x2 x1 x2
shift(-1)
df.groupby(level="ind") Copy with values shifted by 1. A 1 B 2
Copy with values lagged by 1.
Return a GroupBy object, grouped rank(method='dense') B 2 C 3
cumsum()
by values in index level named Ranks with no gaps. C 3 D 4
Cumulative sum.
"ind". rank(method='min') cummax() Set-like Operations
Ranks. Ties get min rank. Cumulative max. x1 x2 pd.merge(ydf, zdf)
All of the summary functions listed above can be applied to a group. rank(pct=True) cummin() B 2 Rows that appear in both ydf and zdf
Additional GroupBy functions: Ranks rescaled to interval [0, 1]. Cumulative min. C 3 (Intersection).
size() agg(function) rank(method='first') cumprod()
Size of each group. Aggregate group using function. Ranks. Ties go to first value. Cumulative product. x1 x2 pd.merge(ydf, zdf, how='outer')
A 1 Rows that appear in either or both ydf and zdf
Windows Plotting B
C
2
3
(Union).
df.expanding() df.plot.hist() df.plot.scatter(x='w',y='h') D 4 pd.merge(ydf, zdf, how='outer',
Return an Expanding object allowing summary functions to be Histogram for each column Scatter chart using pairs of points
x1 x2 indicator=True)
applied cumulatively.
A 1 .query('_merge == "left_only"')
df.rolling(n)
.drop(columns=['_merge'])
Return a Rolling object allowing summary functions to be
Rows that appear in ydf but not zdf (Setdiff).
applied to windows of length n.
Cheatsheet for pandas (http://pandas.pydata.org/) originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet

Pankowecki Robert Domaindriven Rails
100% (1)
Pankowecki Robert Domaindriven Rails
278 pages
GR2100 GR1600 Riding Mower Ops Manual
No ratings yet
GR2100 GR1600 Riding Mower Ops Manual
106 pages
Mastering Omaha8 Poker by Krieger LouTenner Mark Z
100% (2)
Mastering Omaha8 Poker by Krieger LouTenner Mark Z
319 pages
en 13445 5
No ratings yet
en 13445 5
65 pages
CAPE Chemistry Unit 2 Paper 1 2007-2017
77% (31)
CAPE Chemistry Unit 2 Paper 1 2007-2017
108 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Python Full Stack
No ratings yet
Python Full Stack
10 pages
Data Engineer - Roadmap and FREE Resources - Paper 2021
No ratings yet
Data Engineer - Roadmap and FREE Resources - Paper 2021
7 pages
How To Extend RapidMiner 5
No ratings yet
How To Extend RapidMiner 5
92 pages
Hands-On Data Science and Python Machine Learning - Perform Data Mining and Machine Learning Efficiently Using Python and Spark PDF
No ratings yet
Hands-On Data Science and Python Machine Learning - Perform Data Mining and Machine Learning Efficiently Using Python and Spark PDF
415 pages
DQL CheatSheet
No ratings yet
DQL CheatSheet
4 pages
Data Structures and Algorithms Using Python and C by David M Reed John Zelle 1590282337
No ratings yet
Data Structures and Algorithms Using Python and C by David M Reed John Zelle 1590282337
5 pages
Resume Tkinter
100% (1)
Resume Tkinter
2 pages
Tableau CheatSheet Zep
No ratings yet
Tableau CheatSheet Zep
1 page
Quickflask: Flasking All The Things! and Then Some
No ratings yet
Quickflask: Flasking All The Things! and Then Some
28 pages
Advanced Plotting
No ratings yet
Advanced Plotting
49 pages
Python Mongodb Tutorial
100% (1)
Python Mongodb Tutorial
37 pages
Sqlite3 Cheat Sheet: by Via
No ratings yet
Sqlite3 Cheat Sheet: by Via
2 pages
Step by Step Guide How To Rapidly Build Neural Networks
No ratings yet
Step by Step Guide How To Rapidly Build Neural Networks
6 pages
Python SQL
No ratings yet
Python SQL
4 pages
Git 101 For Dummies: Prologue
No ratings yet
Git 101 For Dummies: Prologue
13 pages
Revision Point - Series
No ratings yet
Revision Point - Series
5 pages
D3 Tips and Tricks PDF
No ratings yet
D3 Tips and Tricks PDF
562 pages
2016 Big Data Analytics Market Study - Wisdom of Crowdsr Series - Licensed To Pentaho - Copyright 2016 Dresner Advisory Services
No ratings yet
2016 Big Data Analytics Market Study - Wisdom of Crowdsr Series - Licensed To Pentaho - Copyright 2016 Dresner Advisory Services
93 pages
Cuestionario Why Big Data and Where Did It Come From?
50% (2)
Cuestionario Why Big Data and Where Did It Come From?
4 pages
Docker Presentation
No ratings yet
Docker Presentation
8 pages
Pybarcode Documentation: Release 0.13.1
No ratings yet
Pybarcode Documentation: Release 0.13.1
22 pages
Pima Indians Diabetes Database Analysis - Kaggle
No ratings yet
Pima Indians Diabetes Database Analysis - Kaggle
37 pages
Grafana Documentation: Dashboards Created: 1. TCS-ATG-ENDECA-Dashboard
No ratings yet
Grafana Documentation: Dashboards Created: 1. TCS-ATG-ENDECA-Dashboard
7 pages
Python programms
No ratings yet
Python programms
8 pages
Object-Oriented Concepts For Database Design: Michael R. Blaha and William J. Premerlani
No ratings yet
Object-Oriented Concepts For Database Design: Michael R. Blaha and William J. Premerlani
9 pages
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
No ratings yet
Microstrategy Tips and Techniques: Reporting Essentials Five Styles of Business Intelligence
20 pages
Curso de Python 2
No ratings yet
Curso de Python 2
152 pages
Protractor
No ratings yet
Protractor
36 pages
Software Requirements Engineering
No ratings yet
Software Requirements Engineering
31 pages
UN Data Analysis Pandas Matplotlib
No ratings yet
UN Data Analysis Pandas Matplotlib
28 pages
Duckdb-Docs-0 9 2
No ratings yet
Duckdb-Docs-0 9 2
897 pages
SQL Server 2022 Datasheet
No ratings yet
SQL Server 2022 Datasheet
2 pages
PYTHON PANDAS-Module1
No ratings yet
PYTHON PANDAS-Module1
4 pages
Building and Evaluating ML Models
No ratings yet
Building and Evaluating ML Models
27 pages
Basic Neo4j Code Examples 2008-05-08
No ratings yet
Basic Neo4j Code Examples 2008-05-08
16 pages
Python Django Tutorial - Django Models: Creating A Model
No ratings yet
Python Django Tutorial - Django Models: Creating A Model
8 pages
Lucene Solr
No ratings yet
Lucene Solr
52 pages
Computer Vision - Ipynb - Colaboratory
No ratings yet
Computer Vision - Ipynb - Colaboratory
17 pages
Duckdb Docs
No ratings yet
Duckdb Docs
721 pages
TalendOpenStudio BigData GettingStarted 5.4.1 en
No ratings yet
TalendOpenStudio BigData GettingStarted 5.4.1 en
60 pages
Cheat Sheet: Learn Python For Data Science Interactively at
No ratings yet
Cheat Sheet: Learn Python For Data Science Interactively at
1 page
Grafana
No ratings yet
Grafana
1 page
Java 8 & 9 in Action, Second Edition
100% (2)
Java 8 & 9 in Action, Second Edition
22 pages
Python Journey From Novice To Expert B01LD8K8WW SAMPLE
0% (1)
Python Journey From Novice To Expert B01LD8K8WW SAMPLE
21 pages
Programa Ciencia de Datos y Machine Learning Con Python - Feb23
No ratings yet
Programa Ciencia de Datos y Machine Learning Con Python - Feb23
13 pages
Logistic Regression Learning Annotated
No ratings yet
Logistic Regression Learning Annotated
77 pages
Building Your First Laravel Application
100% (1)
Building Your First Laravel Application
38 pages
Python
No ratings yet
Python
18 pages
Intro To Flask!
No ratings yet
Intro To Flask!
323 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
Building A RESTful API With Django-Rest-Framework - Agiliq Blog - Django Web App Development
No ratings yet
Building A RESTful API With Django-Rest-Framework - Agiliq Blog - Django Web App Development
7 pages
MySQL 8 For Developers
No ratings yet
MySQL 8 For Developers
113 pages
Python GUI - Tkinter: To Create A Tkinter
No ratings yet
Python GUI - Tkinter: To Create A Tkinter
21 pages
Python Intro
No ratings yet
Python Intro
13 pages
Django A Pi Book
No ratings yet
Django A Pi Book
49 pages
Solr Elasticsearch
No ratings yet
Solr Elasticsearch
10 pages
Learning Apache Cassandra - Second Edition
From Everand
Learning Apache Cassandra - Second Edition
Sandeep Yarabarla
No ratings yet
Razmenjivac Sa U-Cevima
No ratings yet
Razmenjivac Sa U-Cevima
15 pages
Za Prikljucke Na Dancu
No ratings yet
Za Prikljucke Na Dancu
13 pages
144411
No ratings yet
144411
12 pages
BW Hydrogen Whitepaper EN Print
No ratings yet
BW Hydrogen Whitepaper EN Print
10 pages
Manuscript r0
No ratings yet
Manuscript r0
20 pages
SMETA 7.0 Issue Titles With Method and Timescale October 2024 v1.0
No ratings yet
SMETA 7.0 Issue Titles With Method and Timescale October 2024 v1.0
76 pages
Podcast listening test-1
No ratings yet
Podcast listening test-1
9 pages
UST Admission Guide PDF
No ratings yet
UST Admission Guide PDF
21 pages
Reliance Mutual Fund Reliance Small Cap Fund: Presents
No ratings yet
Reliance Mutual Fund Reliance Small Cap Fund: Presents
33 pages
Ancient Sexual Transmutation. The Great Work
No ratings yet
Ancient Sexual Transmutation. The Great Work
5 pages
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
No ratings yet
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
14 pages
SDS - Cougar Roadline Paint
No ratings yet
SDS - Cougar Roadline Paint
5 pages
Slides Recruitment & Selection
No ratings yet
Slides Recruitment & Selection
28 pages
Produce Embrioidered Articles (EA)
100% (1)
Produce Embrioidered Articles (EA)
21 pages
Year 7 UT 2
No ratings yet
Year 7 UT 2
10 pages
(Reearch) - Tangerine Biochemistry
No ratings yet
(Reearch) - Tangerine Biochemistry
4 pages
Real Estate Case Study
No ratings yet
Real Estate Case Study
6 pages
2024 Grade 8 November Mock Exam Memorandum p1
No ratings yet
2024 Grade 8 November Mock Exam Memorandum p1
7 pages
Reading Compilation by Susi Fauziah
100% (3)
Reading Compilation by Susi Fauziah
59 pages
2 TD Electrostatic Field
No ratings yet
2 TD Electrostatic Field
3 pages
Ad4 Research Pavilion
No ratings yet
Ad4 Research Pavilion
9 pages
Primêre Skool Worcester Primary School: What Happened To Our Education?
No ratings yet
Primêre Skool Worcester Primary School: What Happened To Our Education?
4 pages
Adnoc Gas: Power Cable Sizing Calculation
No ratings yet
Adnoc Gas: Power Cable Sizing Calculation
22 pages
AQA Double Slit Interference Answers
No ratings yet
AQA Double Slit Interference Answers
15 pages
Ariston
No ratings yet
Ariston
40 pages
Orders, Decorations, and Medals of Indonesia
No ratings yet
Orders, Decorations, and Medals of Indonesia
18 pages
A Game of Polo with a Headless Goat by Emma Levine
No ratings yet
A Game of Polo with a Headless Goat by Emma Levine
4 pages
UEL Student Guide v1.9
0% (1)
UEL Student Guide v1.9
49 pages
III B. Tech II Sem (R21) CSE-AI - DL - Model Question Paper Set-1
No ratings yet
III B. Tech II Sem (R21) CSE-AI - DL - Model Question Paper Set-1
3 pages
Series Quiz 1
No ratings yet
Series Quiz 1
5 pages
History and Geography in Year 3
No ratings yet
History and Geography in Year 3
3 pages

Creating Dataframes Reshaping Data

Uploaded by

Creating Dataframes Reshaping Data

Uploaded by

Tidy Data – A foundation for wrangling in pandas

# of distinct values in a column. x1 x2 x3 pd.merge(adf, bdf,

You might also like