SlideShare a Scribd company logo
Programming for Data
Analysis
Week 3
Dr. Ferdin Joe John Joseph
Faculty of Information Technology
Thai – Nichi Institute of Technology, Bangkok
Today’s lesson
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
2
• Pivoting
• Binning
• Replacing and Renaming
• Laboratory
Pivoting in pandas
pandas.DataFrame.pivot_table
Syntax:
DataFrame.pivot_table(values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False, dropna=True,
margins_name='All')
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
3
Parameters
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
4
Output
A pivoted table in the form of dataframe
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
5
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
6
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
7
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
8
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
9
Binning
• When dealing with continuous numeric data, it is often helpful to bin
the data into multiple buckets for further analysis.
• There are several different terms for binning including bucketing,
discrete binning, discretization or quantization.
• Pandas supports these approaches using the cut and qcut functions.
• Histogram is mostly used to visualize
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
10
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
11
Binning
• Pandas to process data
• Numpy to calculate arrays
• Seaborn to visualize histogram
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
12
Qcut
• Qcut is used to divide data in four quarters equally
• when you ask for quintiles with qcut, the bins will be chosen so that
you have the same number of records in each bin. You have 30
records, so should have 6 in each bin (your output should look like
this, although the breakpoints will differ due to the random draw)
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
13
Cut
• cut will choose the bins to be evenly spaced according to the values
themselves and not the frequency of those values.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
14
Binning – Read Data
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
15
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
16
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
17
Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
18
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
19
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
20
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
21
Binning quantized in variable
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
22
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
23
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
24
Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
25
Binning – Other applications
• Image histograms
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
26
Statistical Data Binning
• Statistical data binning is a way to group numbers of more or less
continuous values into a smaller number of "bins".
• For example, if you have data about a group of people, you might
want to arrange their ages into a smaller number of age intervals (for
example, grouping every five years together).
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
27
Methods to divide Bins
• Equal frequency binning
• Equal width binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
28
Equal frequency binning
• Bins have equal frequency
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
29
Equal Width Binning
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
30
Code
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
31
Advantages
• binning allows easy identification of outliers,
• invalid and missing values of numerical variables.
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
32
DSA 207 - Binning
• Create pivot table to find month wise average of internal and external
temperature, humidity and carbon monoxide levels in the fish data
• Visualize the binning of humidity levels in fish data over a particular
time of a day in a month. Do it with the following
• 1. Qcut
• 2. Cut
• 3. Naming Bins
Faculty of Information Technology, Thai - Nichi Institute of
Technology, Bangkok
33

More Related Content

What's hot (20)

PDF
Programming for Data Analysis: Week 4
Ferdin Joe John Joseph PhD
 
PDF
Blockchain Technology - Week 2 - Blockchain Terminologies
Ferdin Joe John Joseph PhD
 
PDF
Blockchain Technology - Week 4 - Hyperledger and Smart Contracts
Ferdin Joe John Joseph PhD
 
PDF
Blockchain Technology - Week 9 - Blockciphers
Ferdin Joe John Joseph PhD
 
PDF
Data Wrangling Week 4
Ferdin Joe John Joseph PhD
 
PDF
Blockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Ferdin Joe John Joseph PhD
 
PDF
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Ferdin Joe John Joseph PhD
 
PDF
Data wrangling week 10
Ferdin Joe John Joseph PhD
 
PDF
Data wrangling week 6
Ferdin Joe John Joseph PhD
 
PDF
Week 12: Cloud AI- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 11: Cloud Native- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 10: Cloud Security- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PPTX
研究室紹介スライド
08fi134
 
PDF
DSA 103 Object Oriented Programming :: Week 1
Ferdin Joe John Joseph PhD
 
PDF
研究発表を準備する(2022年版)
Takayuki Itoh
 
PDF
モデルではなく、データセットを蒸留する
Takahiro Kubo
 
PDF
研究フレームワーク
Hiro Hamada
 
PDF
AtCoder Regular Contest 037 解説
AtCoder Inc.
 
PDF
最近のディープラーニングのトレンド紹介_20200925
小川 雄太郎
 
ODP
卒業論文発表スライド 分割統治法の拡張
masakazuyamanaka
 
Programming for Data Analysis: Week 4
Ferdin Joe John Joseph PhD
 
Blockchain Technology - Week 2 - Blockchain Terminologies
Ferdin Joe John Joseph PhD
 
Blockchain Technology - Week 4 - Hyperledger and Smart Contracts
Ferdin Joe John Joseph PhD
 
Blockchain Technology - Week 9 - Blockciphers
Ferdin Joe John Joseph PhD
 
Data Wrangling Week 4
Ferdin Joe John Joseph PhD
 
Blockchain Technology - Week 11 - Thai-Nichi Institute of Technology
Ferdin Joe John Joseph PhD
 
Blockchain Technology - Week 10 - CAP Teorem, Byzantines General Problem
Ferdin Joe John Joseph PhD
 
Data wrangling week 10
Ferdin Joe John Joseph PhD
 
Data wrangling week 6
Ferdin Joe John Joseph PhD
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
研究室紹介スライド
08fi134
 
DSA 103 Object Oriented Programming :: Week 1
Ferdin Joe John Joseph PhD
 
研究発表を準備する(2022年版)
Takayuki Itoh
 
モデルではなく、データセットを蒸留する
Takahiro Kubo
 
研究フレームワーク
Hiro Hamada
 
AtCoder Regular Contest 037 解説
AtCoder Inc.
 
最近のディープラーニングのトレンド紹介_20200925
小川 雄太郎
 
卒業論文発表スライド 分割統治法の拡張
masakazuyamanaka
 

Similar to Programming for Data Analysis: Week 3 (20)

PDF
Data wrangling week3
Ferdin Joe John Joseph PhD
 
PDF
Introduction to Data Science - Week 2 - Predictive Analytics
Ferdin Joe John Joseph PhD
 
PPTX
Binning businesses intelligence notes.pptx
bsclmr131922
 
PPTX
Handling noisy data
Vivek Gandhi
 
PDF
Data Wrangling Week 7
Ferdin Joe John Joseph PhD
 
PDF
Pivot Selection Techniques
Catarina Moreira
 
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
PDF
Python for Financial Data Analysis with pandas
Wes McKinney
 
PDF
Slides 111017220255-phpapp01
Ken Mwai
 
PPT
Pre-Processing and Data Preparation
Umair Shafique
 
PPTX
Data preparation
AuliyaRahman9
 
PDF
Visual binning
Saroj Suwal
 
PDF
Statistics and Data Mining
R A Akerkar
 
PDF
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
ImXaib
 
PPT
Datapreprocess
sharmila parveen
 
PDF
Data Manipulation Using R (& dplyr)
Ram Narasimhan
 
PPT
data warehousing & minining 1st unit
bhagathk
 
PDF
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
PPT
Data preparation
James Wong
 
PPT
Data preparation
Tony Nguyen
 
Data wrangling week3
Ferdin Joe John Joseph PhD
 
Introduction to Data Science - Week 2 - Predictive Analytics
Ferdin Joe John Joseph PhD
 
Binning businesses intelligence notes.pptx
bsclmr131922
 
Handling noisy data
Vivek Gandhi
 
Data Wrangling Week 7
Ferdin Joe John Joseph PhD
 
Pivot Selection Techniques
Catarina Moreira
 
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
Python for Financial Data Analysis with pandas
Wes McKinney
 
Slides 111017220255-phpapp01
Ken Mwai
 
Pre-Processing and Data Preparation
Umair Shafique
 
Data preparation
AuliyaRahman9
 
Visual binning
Saroj Suwal
 
Statistics and Data Mining
R A Akerkar
 
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
ImXaib
 
Datapreprocess
sharmila parveen
 
Data Manipulation Using R (& dplyr)
Ram Narasimhan
 
data warehousing & minining 1st unit
bhagathk
 
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
Data preparation
James Wong
 
Data preparation
Tony Nguyen
 
Ad

More from Ferdin Joe John Joseph PhD (17)

PDF
Invited Talk DGTiCon 2022
Ferdin Joe John Joseph PhD
 
PDF
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Ferdin Joe John Joseph PhD
 
PDF
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Ferdin Joe John Joseph PhD
 
PDF
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Ferdin Joe John Joseph PhD
 
PDF
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
PDF
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Ferdin Joe John Joseph PhD
 
PDF
Hadoop in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
PDF
Cloud Computing Essentials in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
PDF
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
PDF
Deep learning - Introduction
Ferdin Joe John Joseph PhD
 
PDF
Data wrangling week 11
Ferdin Joe John Joseph PhD
 
PDF
Data wrangling week 9
Ferdin Joe John Joseph PhD
 
PDF
Deep Learning and CNN Architectures
Ferdin Joe John Joseph PhD
 
Invited Talk DGTiCon 2022
Ferdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Ferdin Joe John Joseph PhD
 
Hadoop in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Cloud Computing Essentials in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
Deep learning - Introduction
Ferdin Joe John Joseph PhD
 
Data wrangling week 11
Ferdin Joe John Joseph PhD
 
Data wrangling week 9
Ferdin Joe John Joseph PhD
 
Deep Learning and CNN Architectures
Ferdin Joe John Joseph PhD
 
Ad

Recently uploaded (20)

PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
What Is Data Integration and Transformation?
subhashenia
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
Research Methodology Overview Introduction
ayeshagul29594
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
big data eco system fundamentals of data science
arivukarasi
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 

Programming for Data Analysis: Week 3

  • 1. Programming for Data Analysis Week 3 Dr. Ferdin Joe John Joseph Faculty of Information Technology Thai – Nichi Institute of Technology, Bangkok
  • 2. Today’s lesson Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 2 • Pivoting • Binning • Replacing and Renaming • Laboratory
  • 3. Pivoting in pandas pandas.DataFrame.pivot_table Syntax: DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All') Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 3
  • 4. Parameters Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 4
  • 5. Output A pivoted table in the form of dataframe Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 5
  • 6. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 6
  • 7. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 7
  • 8. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 8
  • 9. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 9
  • 10. Binning • When dealing with continuous numeric data, it is often helpful to bin the data into multiple buckets for further analysis. • There are several different terms for binning including bucketing, discrete binning, discretization or quantization. • Pandas supports these approaches using the cut and qcut functions. • Histogram is mostly used to visualize Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 10
  • 11. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 11
  • 12. Binning • Pandas to process data • Numpy to calculate arrays • Seaborn to visualize histogram Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 12
  • 13. Qcut • Qcut is used to divide data in four quarters equally • when you ask for quintiles with qcut, the bins will be chosen so that you have the same number of records in each bin. You have 30 records, so should have 6 in each bin (your output should look like this, although the breakpoints will differ due to the random draw) Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 13
  • 14. Cut • cut will choose the bins to be evenly spaced according to the values themselves and not the frequency of those values. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 14
  • 15. Binning – Read Data Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 15
  • 16. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 16
  • 17. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 17
  • 18. Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 18
  • 19. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 19
  • 20. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 20
  • 21. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 21
  • 22. Binning quantized in variable Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 22
  • 23. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 23
  • 24. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 24
  • 25. Naming Bins Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 25
  • 26. Binning – Other applications • Image histograms Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 26
  • 27. Statistical Data Binning • Statistical data binning is a way to group numbers of more or less continuous values into a smaller number of "bins". • For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 27
  • 28. Methods to divide Bins • Equal frequency binning • Equal width binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 28
  • 29. Equal frequency binning • Bins have equal frequency Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 29
  • 30. Equal Width Binning Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 30
  • 31. Code Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 31
  • 32. Advantages • binning allows easy identification of outliers, • invalid and missing values of numerical variables. Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 32
  • 33. DSA 207 - Binning • Create pivot table to find month wise average of internal and external temperature, humidity and carbon monoxide levels in the fish data • Visualize the binning of humidity levels in fish data over a particular time of a day in a month. Do it with the following • 1. Qcut • 2. Cut • 3. Naming Bins Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok 33