Data Science Lecture No 03

Uploaded by

abdul baqi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data Science Lecture No 03

Uploaded by

abdul baqi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Lecture No.

03 SEN –5 th

Course: Data Science

Instructor: Dr. Maryum Nisar

12/15/2024 1
Data Science

12/15/2024 2
Lecture Contents
 Steps in EDA

12/15/2024 3
Steps in EDA
 The various steps involved in data analysis.
 Problem definition
 Data preparation
 Data analysis
 Development and representation of the results

12/15/2024 4
Problem definition
 It is essential to define the business problem to be solved. The
problem definition works as the driving force for a data analysis plan
execution.
The main tasks involved in:
 Defining the main objective of the analysis
 Defining the main deliverables
 Outlining the main roles and responsibilities,
 Obtaining the current status of the data
 Defining the timetable, and performing
 cost/benefit analysis.
5

12/15/2024 5
Data preparation
This step involves methods for preparing the dataset before actual
analysis.
In this step, we define
 The sources of data
 Define data schemas and tables
 Understand the main characteristics of the data
 Clean the dataset
 Delete non-relevant datasets
 Transform the data
 Divide the data into required chunks for analysis
6

12/15/2024 6
Data analysis
This is one of the most crucial steps that deals with descriptive statistics and analysis of the data.
The main tasks involve
 summarizing the data
 Finding the hidden correlation and relationships among the data
 Developing predictive models
 Evaluating the models
 Calculating the accuracies
Some of the techniques used for data summarization are:
 Summary tables
 Graphs
 Descriptive statistics
 Inferential statistics
 Correlation statistics
 Searching
 Grouping
7  Mathematical models.
12/15/2024 7
Development and representation of the results
This step involves
 Presenting the dataset to the target audience in the form
o Graphs
o Summary tables
o Maps and diagrams
 This is also an essential step as the result analyzed from the dataset should be
interpretable by the business stakeholders, which is one of the major goals of EDA.
 Most of the graphical analysis techniques include
o Scattering plots,
o Character plots
o Histograms
o Box plots
o Residual plots and mean plots
8

12/15/2024 8
Making sense of data
Different disciplines store different kinds of data for different purposes.
 For example, medical researchers store patients' data, universities store students'
and teachers' data, and real estate industries storehouse and building datasets.
 A dataset contains many observations about a particular object.
 For instance, a dataset about patients in a hospital can contain many observations.
o A patient can be described by a patient identifier (ID), name, address,
weight, date of birth, address, email, and gender. Each of these features that
describes a patient is a variable.

12/15/2024 9
Making sense of data
 These datasets are stored in hospitals and are presented for analysis. Most of this data
is stored in some sort of database management system in tables/schema. An example
of a table for storing patient information is shown here:

12/15/2024 10
Data Types
Numerical data
 This data has a sense of measurement involved in it;
 for example, a person's age, height, weight, blood pressure, heart rate,
temperature, number of teeth, number of bones, and the number of family
members.
 This data is often referred to as quantitative data in statistics.
 The numerical dataset can be either discrete or continuous types.
 Discrete data:
o This is data that is countable and its values can be listed out.
o For example, if we flip a coin, the number of heads in 200 coin flips can take
values from 0 to 200 (finite) cases.
o A variable that represents a discrete dataset is referred to as a discrete
variable. The discrete variable takes a fixed number of distinct values.
11

12/15/2024 11
Data Types
Numerical data
 This data has a sense of measurement involved in it;
 for example, a person's age, height, weight, blood pressure, heart rate,
temperature, number of teeth, number of bones, and the number of family
members.
 This data is often referred to as quantitative data in statistics.
 The numerical dataset can be either discrete or continuous types.
 Continuous data :
o A variable that can have an infinite number of numerical values within a
specific range is classified as continuous data.
o A variable describing continuous data is a continuous variable.
o For example, what is the temperature of your city today? Can we be finite?
12

12/15/2024 12
Data Types
Numerical data

12/15/2024 13
Data Types
Categorical data
 This type of data represents the characteristics of an object; for example, gender, marital
status, type of address, or categories of the movies. This data is often referred to as
qualitative datasets in statistics. To understand clearly, here are some of the most common
types of categorical data you can find in data:
o Gender (Male, Female, Other, or Unknown)
o Marital Status (Annulled, Divorced, Interlocutory, Legally Separated, Married,
o Polygamous, Never Married, Domestic Partner, Unmarried, Widowed, or Unknown)
o Movie genres (Action, Adventure, Comedy, Crime, Drama, Fantasy, Historical,
Horror, Mystery, e.t.c)
o Blood type (A, B, AB, or O)
o Types of drugs (Stimulants, Depressants, Hallucinogens, Dissociatives, Opioids,
Inhalants, or Cannabis)

12/15/2024 14
Data Types
Categorical data
 A variable describing categorical data is referred to as a categorical variable.
These types of variables can have one of a limited number of values. It is easier for
computer science students to understand categorical values as enumerated types
or enumerations of variables. There are different types of categorical variables:
o A binary categorical variable can take exactly two values and is also referred to
as a dichotomous variable. For example, when you create an experiment,
the result is either success or failure. Hence, results can be understood as a
binary categorical variable.
o Polytomous variables are categorical variables that can take more than two
possible values. For example, marital status can have several values, such as
annulled, divorced, interlocutory, legally separated, married, polygamous,
never married, domestic partners, unmarried, widowed, domestic partner, and
unknown. Since marital status can take more than two possible values, it is a
polytomous variable.
15

12/15/2024 15
Measurement scales
There are four different types of measurement scales described in statistics:
 Nominal
 Ordinal
 Interval
 Ratio
Nominal
 These are practiced for labeling variables without any quantitative value. The scales are
generally referred to as labels. And these scales are mutually exclusive and do not carry any
numerical importance.
 Let's see some examples:
o What is your gender?
o Male
o Female
o Third gender/Non-binary
16
o I prefer not to answer
12/15/2024 16
o Other
Nominal
In the case of a nominal dataset, you can certainly know the
following:
 Frequency is the rate at which a label occurs over a period of time within the
dataset.
 Proportion can be calculated by dividing the frequency by the total number of
events.
 Then, you could compute the percentage of each proportion.
 And to visualize the nominal dataset, you can use either a pie chart or a bar
chart.

12/15/2024 17
Ordinal
The main difference in the ordinal and nominal scale is the order. In
ordinal scales, the order of the values is a significant factor. Frequency
is the rate at which a label occurs over a period of time within the
dataset.
 Have you heard about the Likert scale, which uses a variation of an ordinal scale?

12/15/2024 18
Measurement Scales
Interval
 In interval scales, both the order and exact differences between the values are significant.
Interval scales are widely used in statistics, for example, in the measure of central tendencies
—mean, median, mode, and standard deviations.
 Examples include location in Cartesian coordinates and direction measured in degrees from
magnetic north. The mean, median, and mode are allowed on interval data.
Ratio
 Ratio scales contain order, exact values, and absolute zero, which makes it possible to be
used in descriptive and inferential statistics.
 These scales provide numerous possibilities for statistical analysis. Mathematical operations,
the measure of central tendencies, and the measure of dispersion and coefficient of variation
can also be computed from such scales.

12/15/2024 19
Measurement Scales

12/15/2024 20
Comparing EDA with classical and Bayesian
analysis
Classical data analysis:
 For the classical data analysis approach, the problem definition and data
collection step are followed by model development, which is followed by
analysis and result communication.
Exploratory data analysis approach:
 For the EDA approach, it follows the same approach as classical data analysis
except the model imposition and the data analysis steps are swapped. The
main focus is on the data, its structure, outliers, models, and visualizations.
 Generally, in EDA, we do not impose any deterministic or probabilistic models
on the data.
Bayesian data analysis approach:
21
 The Bayesian approach incorporates prior probability distribution knowledge
12/15/2024
into the analysis steps. 21
Comparing EDA with classical and Bayesian
analysis

12/15/2024 22
Thank You !
12/15/2024 23

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
Secrets of The Inner Self - The Complete Book of Numerology-Angus & Robertson
100% (4)
Secrets of The Inner Self - The Complete Book of Numerology-Angus & Robertson
193 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
19 pages
Pharmaceutical Calculation Volume 4
No ratings yet
Pharmaceutical Calculation Volume 4
2 pages
Welding QA
No ratings yet
Welding QA
40 pages
7273CNC Heavy Duty Programmable Electronic Pattern Sewing Machine
No ratings yet
7273CNC Heavy Duty Programmable Electronic Pattern Sewing Machine
47 pages
Meditation Practice Manual v4-3
100% (1)
Meditation Practice Manual v4-3
36 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Chapter 1. Biostatistics
No ratings yet
Chapter 1. Biostatistics
34 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
FDS Unit II Notes
No ratings yet
FDS Unit II Notes
48 pages
Basic Ideas of Data Management
No ratings yet
Basic Ideas of Data Management
32 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
TOPIC ONE_INTRODUCTION
No ratings yet
TOPIC ONE_INTRODUCTION
72 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
2 Types of Data
No ratings yet
2 Types of Data
44 pages
Stat Notes PDF
No ratings yet
Stat Notes PDF
110 pages
Data Management
No ratings yet
Data Management
57 pages
(Buiness Statistics) Chapter 1 2
No ratings yet
(Buiness Statistics) Chapter 1 2
33 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Data Science
No ratings yet
Data Science
47 pages
QT Summary Document 1
No ratings yet
QT Summary Document 1
45 pages
Data Management
No ratings yet
Data Management
36 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Lecture 2-Introduction To Satistics
No ratings yet
Lecture 2-Introduction To Satistics
43 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Biostatistics
No ratings yet
Biostatistics
234 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Intro To Statistics
No ratings yet
Intro To Statistics
35 pages
Week 01, PT 1
No ratings yet
Week 01, PT 1
16 pages
Data Types: and Its Representation Session - 2 & 3
No ratings yet
Data Types: and Its Representation Session - 2 & 3
33 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
Week 01, PT 1
No ratings yet
Week 01, PT 1
16 pages
Reviewer +Ch+1+Data+and+Data+Preparation+
No ratings yet
Reviewer +Ch+1+Data+and+Data+Preparation+
3 pages
Lecture 1 Statistics and Lecture2 (1)
No ratings yet
Lecture 1 Statistics and Lecture2 (1)
44 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Introduction to Statistics_Note
No ratings yet
Introduction to Statistics_Note
16 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Lesson 1 Definition of Statistics
No ratings yet
Lesson 1 Definition of Statistics
49 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Chapter 1 Correct
No ratings yet
Chapter 1 Correct
31 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Intro
No ratings yet
Intro
67 pages
DSOST2
No ratings yet
DSOST2
44 pages
Data Visualization
No ratings yet
Data Visualization
49 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
University of Gondar: Prepared By: Bisrat Misganaw Department of Statistics
100% (1)
University of Gondar: Prepared By: Bisrat Misganaw Department of Statistics
20 pages
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
No ratings yet
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
40 pages
STAT. Lec.1
No ratings yet
STAT. Lec.1
30 pages
IE5005 Lecture 02
No ratings yet
IE5005 Lecture 02
69 pages
CH01 - Introduction To Statistics 2
No ratings yet
CH01 - Introduction To Statistics 2
52 pages
Data Analysis
No ratings yet
Data Analysis
37 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Exccellent Torrent Power Details
No ratings yet
Exccellent Torrent Power Details
9 pages
Schunk Carbon Technology Carbon Slide Bearings en
No ratings yet
Schunk Carbon Technology Carbon Slide Bearings en
11 pages
IM DE Chapter 1 Lect 3 - SOLUTION OF A DE
No ratings yet
IM DE Chapter 1 Lect 3 - SOLUTION OF A DE
2 pages
15 Series Tube Fittings and Tubes
No ratings yet
15 Series Tube Fittings and Tubes
8 pages
Blackout in Spain and Portugal
No ratings yet
Blackout in Spain and Portugal
13 pages
The Unhealthiest Places in The World
No ratings yet
The Unhealthiest Places in The World
3 pages
DVR 5 EFA0016 Construction Rimula R4 Peru-Brazil
No ratings yet
DVR 5 EFA0016 Construction Rimula R4 Peru-Brazil
2 pages
Group 6 Ucsp
No ratings yet
Group 6 Ucsp
9 pages
Mender Mourad
No ratings yet
Mender Mourad
15 pages
IGBC Green Homes (Ver 2 0) PrecertificationFinal Review - Fusion Homes Greater Noida
No ratings yet
IGBC Green Homes (Ver 2 0) PrecertificationFinal Review - Fusion Homes Greater Noida
15 pages
CATALOG CARRAMICA - February 2023 PDF
No ratings yet
CATALOG CARRAMICA - February 2023 PDF
34 pages
Depression Script
No ratings yet
Depression Script
2 pages
Travel Expense Statement (Simulation) : Itinerary
No ratings yet
Travel Expense Statement (Simulation) : Itinerary
2 pages
SPE-A Poroelastic Analysis To Address The Impact of
No ratings yet
SPE-A Poroelastic Analysis To Address The Impact of
7 pages
Brochure Hysacam FR
No ratings yet
Brochure Hysacam FR
13 pages
Unit-2 PPT-2
No ratings yet
Unit-2 PPT-2
63 pages
Presupuesto Techo Propio 2022
0% (1)
Presupuesto Techo Propio 2022
2 pages
ACE Inhibitors
No ratings yet
ACE Inhibitors
8 pages
Data Sheet For SINAMICS Power Module PM240-2: Rated Data General Tech. Specifications
No ratings yet
Data Sheet For SINAMICS Power Module PM240-2: Rated Data General Tech. Specifications
2 pages
18. Thoại Ngọc Hầu- An Giang
No ratings yet
18. Thoại Ngọc Hầu- An Giang
18 pages
Province of Aklan
No ratings yet
Province of Aklan
11 pages
Handwritten Notes (All)
No ratings yet
Handwritten Notes (All)
18 pages
Most Probable Prelims Questions: Compilation of Clearias Daily Mcqs
No ratings yet
Most Probable Prelims Questions: Compilation of Clearias Daily Mcqs
92 pages
List Barang Proyek Belitung
No ratings yet
List Barang Proyek Belitung
2 pages
The University of Faisalabad: Lab No. 1
No ratings yet
The University of Faisalabad: Lab No. 1
12 pages

Data Science Lecture No 03

Uploaded by

Data Science Lecture No 03

Uploaded by

Lecture No.

Course: Data Science

You might also like