Unit-1-DEV
Unit-1-DEV
Steps in EDA
Data Types
❖ Numerical Data
❖ Categorical Data
Measurement Scales
❖ Nominal
❖ Ordinal
❖ Interval
❖ Ratio
15
Exploratory vs Confirmatory Data Analysis
EDA CDA
• No hypothesis at first • Start with hypothesis
16
STEPS OF EDA
17
Classification of EDA
19
EXAMPLE 1 – Professional Sports
20
EXAMPLE 1 – Professional Sports
21
EXAMPLE 1 – Professional Sports
22
EXAMPLE 1 – Professional Sports
8.In-Game Analytics:
1. Real-time EDA during games can provide coaches and analysts with immediate
insights.
2. Data on player performance, shot selection, and opponent behavior can be analyzed
to make in-game adjustments.
9.Performance Tracking Wearables:
1. Many athletes wear devices that track their movements and physiological data. EDA
can be used to interpret this data and make real-time performance assessments.
10.Revenue Optimization:
1. EDA can be used to analyze revenue sources, such as ticket sales, merchandise, and
broadcasting deals, to optimize revenue streams.
11.Player Development:
1. Coaches and trainers can use EDA to track player development over time and make
adjustments to training and practice routines.
12.Fantasy Sports and Betting:
1. EDA is also used by analysts, enthusiasts, and sports gamblers to gain insights for
fantasy sports and betting purposes.
23
EXAMPLE 2 - Healthcare
24
EXAMPLE 2 - Healthcare
25
EXAMPLE 2 - Healthcare
Any Guess??
28
EXAMPLE 3 - Marketing
1.Customer Segmentation:
1. EDA can identify customer segments based on demographics, behavior, and purchase
history.
2. It helps in tailoring marketing strategies to specific customer groups.
2.Product Analysis:
1. EDA can help analyze product performance, identifying top-selling products,
underperforming items, and opportunities for product development.
3.Pricing Strategies:
1. Analyzing price elasticity and consumer demand through EDA can help optimize pricing
strategies.
4.Market Basket Analysis:
1. EDA can uncover patterns of products that are frequently purchased together, aiding in
cross-selling and recommendation systems.
5.Customer Churn Analysis:
1. EDA can identify factors contributing to customer churn and assist in designing retention
strategies.
6.Campaign Effectiveness:
1. Analyzing marketing campaign data helps determine the effectiveness of various channels,
messages, and timing.
29
2. EDA can reveal which campaigns generate the highest ROI.
EXAMPLE 4*
https://www.biostat.wisc.edu/~lindstro/2.EDA.9.10.pdf 30
Making Sense of Data
Quantitative/Numerical Data
32
Discrete Data
• Discrete data is information that can only take certain fixed values.
• Continuous data can be further broken down into two categories: interval data and
ratio data.
• Interval data can include numerical data that does not use zero as a reference.
Two-way Table
Nominal Data & Ordinal Data
38
Problem-1
Question
1. How old are you?
2. Where do you live? Give the name of your city
3. How many siblings do you have?
4. What is your height?
5. What is your birth date?
6. Do you have a pet?
7. What grade level are you in?
39
Problem-1 : Solution
40
Problem-2
42
Excercise-1
43
Excercise-2
Measurement Scales
Measurement Scales
1.Nominal Scale:
1. Represents categorical data without any inherent order or ranking.
2. Examples: Gender, colors, categories.
3. Differences: Different categories are distinct but not ordered. No
quantitative relationship exists between categories.
2.Ordinal Scale:
1. Represents data with ordered categories or ranks but without
precise differences between them.
2. Examples: Rankings (1st, 2nd, 3rd), Likert scales (Strongly Disagree,
Disagree, Neutral, Agree, Strongly Agree).
3. Differences: Ordered categories, but the exact difference between
ranks may not be defined.
Measurement Scales
3.Interval Scale:
1. Represents data with ordered categories and precise, equal
intervals between them.
2. Examples: Temperature in Celsius or Fahrenheit, calendar dates.
3. Differences: Equal intervals between points on the scale, but there's
no true zero point (zero doesn't indicate the absence of the
attribute being measured).
4.Ratio Scale:
1. Represents data with ordered categories, equal intervals, and a true
zero point.
2. Examples: Height, weight, time, income.
3. Differences: Possesses all the properties of interval scale but also
has a true zero, enabling meaningful ratios and arithmetic
operations.
Example: Identify the measurement scale?
Structured data- This is the data which is in an organized form(e.g, rows and
columns) and can be easily used by a computer program. Relationships exist
between entities of data, such as classes and their objects. Data stored in
databases is an example of structured data.
Semi-structured data- This is the data which does not conform to a data
model but has some structure. However, it is not in a form which can be used
easily by a computer program, for example, emails, XML, markup languages like
HTML etc.,
Unstructured data- -This is the data which does not conform to a data model
or is not in a form which can be used easily by a computer program. About 80%-
90% data of an organization is in this format for example, memos, chat rooms,
powerpoint presentations, images, videos, letters etc,.
Approximate Distribution of Digital Data
Databases such as
Oracle, DB2,
Teradata, MySql,
PostgreSQL, etc
OLTP Systems
Ease with Structured Data
Input / Update /
Delete
Security
Scalability
Transaction
Processing
(ACID
properties
Semi-structured Data
Semi-structured Data
Semi-Structured
Data
Characteristics of Semi-structured Data
Inconsistent Structure
Self-describing
(lable/value pairs)
Semi-structured data
Often Schema information is
blended with data values
Images
Free-Form
Text
Audios
Unstructured data
Videos
Body of
Email
Text
Messages
Chats
Social
Media data
Word
Document
Issues with terminology – Unstructured Data
Data Mining
▪Data Mining
•Association Rule Mining
•Regression Analysis
•Collaborative Filtering
▪Part-of-speech tagging
https://www.simplilearn.com/tutorials/data-analytics-
tutorial/exploratory-data-analysis
https://www.analyticsvidhya.com/blog/2021/05/exploratory-data-
analysis-eda-a-step-by-step-guide/
https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/
https://intellipaat.com/blog/what-is-eda-in-data-science/
https://www.knowledgehut.com/blog/data-science/eda-data-
science#frequently-asked-questions
https://www.powershow.com/view/aca5d-
OTEwN/Exploratory_Data_Analysis_powerpoint_ppt_presentation