0% found this document useful (0 votes)

13 views

Descriptive Analytics - Univariate and Bivariate

Uploaded by

UAXZxaXsx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Descriptive Analytics - Univariate and Bivariate

Uploaded by

UAXZxaXsx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

COE 102

Introductory
Big Data
College of Engineering

Chapter -2-
Descriptive Analytics –
Univariate and Bivariate

Dr Heba Ismail
Descriptive Analytics
Univariate and Bivariate

2
Learning Objectives

• Scale types
• Introduction to descriptive analytics
• Univariate and bivariate descriptive analytics
• Visualization

3
Let’s review a few facts from Chapter 1

• What is Data?
• Data, in the information age, are a large set of digital bits encoding numbers, texts,
images, sounds, videos, and so on.

• Can you give some examples of raw data? (Class Discussion)

• Can you make any decision based on raw data? (Class Discussion)

• What do we need to do with data to have meaningful insights?

• We need to produce some analytics!

4
Let’s Consider these Scenarios
• Can you study the employees behavior in ALL government
organizations by surveying ALL the employees?

• Can you study the purchasing behavior of ALL teenagers around the
globe by surveying ALL teenagers?

• Is it feasible?

• What would be a better alternative?

5
Statistical Concepts
Population
• A set of similar instances/objects or events which is of interest for some
question or experiment
• E.g. all students of my school, all nails produced by a machine
Sample
• A set of a data collected and/or selected from a population by a defined
procedure
• E.g. a subset of the students of my school that answered to a survey, a subset of
randomly selected nails produced by a machine

• Can you give more examples?

6
Statistical Concepts
Deduction
• Reasoning about the sample extracted from that population
• Deduction aims to study the probability of randomly extracting a
representative sample.
Induction
• Generalizing the knowledge obtained from a sample to all of a
population is called statistical inference (or induction)
• E.g. a subset of the students of my school that answered to a survey, a
subset of randomly selected nails produced by a machine

7
Descriptive Statistics
• Descriptive statistics are methods / techniques to describe or
summarize samples in order to help humans to understand it

8
Scale Types
• Qualitative scales
• Nominal: categorize data in a non-
ordinal way
• Operations: = and ≠
• E.g. friend’s name and gender (e.g. Eve is
a Female – Eve is not a Male)
• Ordinal: categorize data in a ordinal
way
• Operations: =, ≠, <, >, ≤, and ≥
• E.g. company
• Let’s compare Andrew and Marcus
Company
9
Scale Types
• Quantitative scales
• Relative (Interval): does not have an
absolute zero
• Operations: =, ≠, <, >, ≤, ≥, - and +
• E.g. temperature
• Absolute (Ratio): has an absolute zero
• Operations: =, ≠, <, >, ≤, ≥, -, +, / and ×
• E.g. weight and heigth

When the attribute “height” is zero it means there is no height.

This is also true for the weight. But for the temperature, when
we have 0∘C it does not mean there is no temperature. When
we talk about weight, we can say that Bernhard weighs twice as
much as Irene, but we cannot say that the maximum
temperature last week in Dennis’ home town was twice that in
10
Eve’s.
Changing Data Scale

This all means that when we have data expressed on an absolute

scale we can convert it to any of the other scales. When we have
data expressed on a relative scale we can convert it in any scale of
the two qualitative scale types. When we have data expressed on
an ordinal scale we can express it in a nominal scale.

11
Class Activity
• Weight is expressed as an absolute scale
• Can you change it into ordinal?
• Can you change it into nominal?
• What do you notice after applying the change to the amount of
information obtained from the weight attribute using nominal
scale?
• Can you compare the weights of Andrew and Marcus after they
have converted to nominal?

12
Textbook Answers – pg. 33.

13
Scales vs Data Types
• In software packages we must choose the data type for each attribute
• Common types are text, character, factor, integer, real, float,
timestamp, date or several others
• A scale and a data type are different concepts despite related
• For instance, a quantitative scale implies the use of numeric data types
• However, an attribute can be expressed as a number but the scale type
can be qualitative
• Think about IDs (e.g., Students ID, National IDs, Shoppers IDs, … etc)
• what kind of quantitative information does it have?
• Can an ID with letters contain the same information?

14
Descriptive Univariate Analysis: Frequencies
• A frequency is basically a counter
• Absolute frequency counts how many times a value appears.
• Relative frequency counts the percentage of times that value appears.

• The absolute cumulative frequency is the number of occurrences less

or equal than a given value

• The relative cumulative frequency is the percentage of occurrences

less or equal than a given value

15
Example 1 – Company

7/14=50%

16
Example 2 – Height

17
Descriptive Univariate Analysis: data
visualization
• Pie chart: it is used
typically for nominal scales
It is not advisable to use
them with scales where
the notion of order exists
– in other words for
ordinal and quantitative
scales – although this is
possible.

18
Descriptive Univariate Analysis: data
visualization
• Bar chart: It is used
typically for qualitative
scales.
• Sometimes it can be used
with quantitative scales
with a limited number of
values.
• It is argued to be easier to
read than pie charts.

19
Descriptive Univariate Analysis: data
visualization
• In a bar chart, we can also
separate the distributions
for the values of some other
attributes

• This is illustrated in the

figure where the frequencies
for the target value of
“company” is split by gender
20
Descriptive Univariate Analysis: data
visualization
• Line chart: They are specially Max Temp Day
used to deal with the notion of 21 1
time. 25 2
30 3
• Like area charts, these are used 20 4
when the horizontal bar uses a 21 5
quantitative scale with equal lag
between observations.

• Represent time series, graphs of

values obtained over regular time
sequences.
21
Descriptive Univariate Analysis: data
visualization Andrew Eve
• Area charts: are specially Max Temp Day Max Temp Day
used to compare time series 21 1 17 1
and distribution functions 25 2 18 2
30 3 19 3
20 4 20 4
• Understanding data 21 5 0 5
distributions give us strong
insights about an attribute.
We are able to see, for
instance, that data are more
concentrated in some values
or that other values are rare.
22
Descriptive Univariate Analysis: data
visualization
• Histograms: are used to
represent empirical distributions
for attributes with a quantitative
scale

• Histograms are characterized by

grouping values in cells, reducing
in this way the sparsity that is
common in quantitative scales.

• Histogram is more informative

than the bar chart.
Descriptive Univariate Analysis: data
visualization
• An important decision to draw a
histogram is to define the
number of cells
• The most advisable value is
problem dependent
• As rule of thumb you can use a
number around the square root
of the number of values

24
Descriptive Univariate Analysis: data
visualization
• Empirical distributions are
based in samples
• Probability distributions are
about populations

25
Descriptive Univariate Analysis: statistics
• A statistic is a descriptor
• Location statistics:
• It describes numerically a • Minimum: is the lowest value
characteristic of the sample or • Maximum: is the largest value
the population • Mean: is the average value
• There are two main groups of • Mode: is the most frequent value
univariate statistics: • The value that is larger than:
• Location statistics • 25% of all values is the 1st quartile
• Dispersion statistics • 50% of all values is the median or 2nd
quartile
• 75% of all values is the 3rd quartile

26
Example
• Let us use as example the attribute
weight from our data set

Graphical representation of the statistics

Location statistic Weight (kg)
Min 55.00
Max 115.00
Mean or average 79.00
Mode 75.00
1st quartile 65.75
2nd quartile or mode 75.00
3rd quartile 87.50
Descriptive Univariate Analysis: statistics
• Box-plots present the minimum,
the 1st quartile, the median, the
 Mean (or average), median and
3rd quartile and the maximum mode are known as measures
statistics, by this order, bottom- of central tendency, because
up or from left to right return a central value from a
• The attribute height set of values

Location statistic Nominal Ordinal Quantitative

Mean No Eventually Yes
Median No Yes Yes
Mode Yes Yes Yes

28
Descriptive Univariate Analysis: statistics
• Box-plots can also be used
to describe the symmetry/
skewness of an attribute

• The median or the mode

are more robust as a
central tendency statistic
than the mean in the
presence of extreme
values or strongly skewed
distributions
29
Descriptive Univariate Analysis: statistics

• Can the mean be used in ordinal

scales?

• This is strongly arguable but there are

examples of its use with numeric
ordinal scales such as the Likert scale

• The Likert uses an ordered scale, e.g.,

integers from 1 (highest
disagreement) to 5 (highest
agreement)
30
Descriptive Univariate Analysis: statistics

• Plots can also be combined • There is only one value for the mean of a
• An example with the attribute population
Height • There is only one value for the mean of a
sample but can exist several samples
from a single population
• The population mean and the sample
mean are calculated in the same way but
are differently represented:
• is the mean population of
• is a mean sample of
31
Descriptive Univariate Analysis: statistics
• Dispersion statistic measures • Dispersion statistics (cont.):
how distant the different values • Mean absolute deviation: Mean
are absolute deviation: is a measure
for the mean absolute distance
• Dispersion statistics: between the observations and the
• Amplitude (Range): is the mean
difference between the maximum • Its math formula for the population
and the minimum values is:
• Interquartile range: is the
difference between the values of • Its math formula for a sample is:
the 3rd and 1st quartiles

32
Descriptive Univariate Analysis: statistics
• Dispersion statistics (cont): • Using again as example the
• Standard deviation: is another weight attribute, dispersion
measure for the typical distance statistics are as shown in the
between the observations and table
their mean
• Its math formula for the population
is: Dispersion statistic Weight (kg)
• Its math formula for a sample is:
Amplitude 60.00
• The square of the standard deviation
is named variance Interquartile range 21.75
14.31
s 17.38

33
Descriptive Univariate Analysis: common
univariate probability distributions
• Different events of our life follow • We present two of these
already studied distributions distributions:
• E.g. the height of adult men, the • The Uniform distribution
value of a random number, or • The Normal distribution, also
known as the Gaussian
the number of cars passing in a
given highway toll • Both are continuous
distributions and have known
probability density functions

Descriptive Univariate Analysis: common
univariate probability distributions
• An attribute that follows the uniform distribution with parameters
and , has equal frequency of occurrence of values in any interval of a
given size

35
Descriptive Univariate Analysis: common
univariate probability distributions
• The Normal distribution is a
• The Normal distribution symmetric and continuous
• Physical quantities that are expected to distribution with two
be the sum of many independent parameters:
factors (e.g., the men' height) typically • The mean localizes the
have approximately Normal highest point of the bell like
distributions distribution
• The standard deviation
defines how thin or larger
the bell form of the
distribution is

36
Descriptive bivariate analysis
• When the two attributes of the pair
are quantitative
• There are several visualization
techniques able to visually show the
distribution of points with two
quantitative attributes
• One of these techniques is the scatter
plots

37
Descriptive bivariate analysis
• Pearson correlation
• Sample Pearson correlation

• Is scale independent: values always between

[-1, 1]
• If the points form:
• an increasing line, the Pearson correlation
coefficient will be 1
• a decreasing line, its value will be -1
• a horizontal line or a cloud without increasing or
decreasing tendency, its value will be 0

38
Descriptive bivariate analysis
• The Spearman's rank correlation, as the name suggests, is based on
rankings
• Compares how similar are the ranking positions of the values of the
two attributes

39
Example Friend Weight Height Ranked Ranked
(cm) (cm) weight height
Andrew 77 175 1.0 1.0
Bernhard 110 195 4.0 2.0
• Pearson correlation Carolina 70 172 2.0 3.0
Dennis 85 180 3.0 4.0
Eve 65 168 5.0 5.5
• Spearman's rank Fred 75 173 6.0 5.5
correlation Gwyneth 75 180 7.5 7.0
• Hayden 63 165 9.0 8.0
Irene 55 158 7.5 9.5
James 66 163 11.0 9.5
Kevin 95 190 10.0 11.0
Lea 72 172 12.0 12.0
Marcus 83 185 14.0 13.0
Nigel 115 192 13.0 14.0
Reading
• Textbook: Chapter -2- from the textbook
• Moreira, João, André Carlos Ponce de Leon Ferreira, and Tomáš Horváth. A
general introduction to data analytics. Wiley, 2019. ISBN: 9781119296263.

From Evaluation To IEP Goals - Are You Doing It Right?
79% (14)
From Evaluation To IEP Goals - Are You Doing It Right?
2 pages
Practical Research 2. Module 2
No ratings yet
Practical Research 2. Module 2
24 pages
Descriptive Analytics - Uni and Bi
No ratings yet
Descriptive Analytics - Uni and Bi
36 pages
Introduction To Data Analytics-Module 1 Part 2
No ratings yet
Introduction To Data Analytics-Module 1 Part 2
78 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
No ratings yet
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
18 pages
SCA - Module 4
No ratings yet
SCA - Module 4
49 pages
Research Report
No ratings yet
Research Report
47 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
44 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
21 pages
research 6 1
No ratings yet
research 6 1
34 pages
Unit 1
No ratings yet
Unit 1
72 pages
IE5005 Lecture 02
No ratings yet
IE5005 Lecture 02
69 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Ba Lecture 2
No ratings yet
Ba Lecture 2
54 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
Module1 Understanding Data1
No ratings yet
Module1 Understanding Data1
56 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
WINSEM2024-25_MCSE615L_TH_VL2024250502897_2025-01-07_Reference-Material-I
No ratings yet
WINSEM2024-25_MCSE615L_TH_VL2024250502897_2025-01-07_Reference-Material-I
50 pages
Lecture Notes: (Introduction To Medical Laboratory Science Research)
No ratings yet
Lecture Notes: (Introduction To Medical Laboratory Science Research)
13 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
CH 8 Data Analysis
No ratings yet
CH 8 Data Analysis
34 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Presentation 1
No ratings yet
Presentation 1
46 pages
Unit 4
No ratings yet
Unit 4
25 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Descriptive_Statistics_Hand-out__MMS
No ratings yet
Descriptive_Statistics_Hand-out__MMS
27 pages
Quantitative Data Analysis Thru Descriptive Statistics
No ratings yet
Quantitative Data Analysis Thru Descriptive Statistics
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Data Analysis Topics Discussed Getting Data Ready For Analysis 1) - Editing Data (Definition)
No ratings yet
Data Analysis Topics Discussed Getting Data Ready For Analysis 1) - Editing Data (Definition)
8 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Data Types: and Its Representation Session - 2 & 3
No ratings yet
Data Types: and Its Representation Session - 2 & 3
33 pages
DWDM UNIT-2
No ratings yet
DWDM UNIT-2
19 pages
FET 401 Week 8 Lecture Note
No ratings yet
FET 401 Week 8 Lecture Note
21 pages
Intro
No ratings yet
Intro
67 pages
3. Variables & Chart
No ratings yet
3. Variables & Chart
60 pages
Research Project 1
No ratings yet
Research Project 1
17 pages
CH 01
No ratings yet
CH 01
11 pages
Lecture 15_ Descriptive Analysis(1)
No ratings yet
Lecture 15_ Descriptive Analysis(1)
21 pages
Module No 2 - Part 2 - Compressed - Compressed
No ratings yet
Module No 2 - Part 2 - Compressed - Compressed
46 pages
CH 12 Analyse Quantitative Data
No ratings yet
CH 12 Analyse Quantitative Data
34 pages
CH 01
No ratings yet
CH 01
15 pages
Intro To Statistics
No ratings yet
Intro To Statistics
35 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
4RIVQ2 Data Analysis for Quantitative Research
No ratings yet
4RIVQ2 Data Analysis for Quantitative Research
23 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Statistical Treatment
No ratings yet
Statistical Treatment
22 pages
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
No ratings yet
Business Data Analysis and Interpretation Notes Lecture Notes Lectures 1 13
20 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Exploratory Data Analysis_v3_part1
No ratings yet
Exploratory Data Analysis_v3_part1
36 pages
Chapter 1 Classification and Graphical Presentation [Becon 2025]
No ratings yet
Chapter 1 Classification and Graphical Presentation [Becon 2025]
67 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
No ratings yet
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
66 pages
Statistics
No ratings yet
Statistics
88 pages
STATS
No ratings yet
STATS
22 pages
IT326 - Ch2
No ratings yet
IT326 - Ch2
44 pages
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
72 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Conceptual Framework in Master Thesis
100% (3)
Conceptual Framework in Master Thesis
7 pages
GE 155.1 4C Group 4 LabEx 1
No ratings yet
GE 155.1 4C Group 4 LabEx 1
18 pages
GBMT1011 Assignment 2
No ratings yet
GBMT1011 Assignment 2
5 pages
2.1 Statement of The Problem
No ratings yet
2.1 Statement of The Problem
5 pages
Adapted From A Presentation by Denise Tarlinton
No ratings yet
Adapted From A Presentation by Denise Tarlinton
35 pages
Coherent Gravity I K Stiffes
No ratings yet
Coherent Gravity I K Stiffes
8 pages
Research Methods in Psychology
No ratings yet
Research Methods in Psychology
4 pages
Data Management
No ratings yet
Data Management
18 pages
Akij
No ratings yet
Akij
3 pages
Statistical Analysis For Research
No ratings yet
Statistical Analysis For Research
5 pages
Marketing Appraisal of A Project
No ratings yet
Marketing Appraisal of A Project
34 pages
Zernike Polynomials
No ratings yet
Zernike Polynomials
7 pages
Final Na Jud Shutacca!
No ratings yet
Final Na Jud Shutacca!
55 pages
Ethical Analysis Rubric - Quality Analysis: Level of Quality Categories Low Average High
No ratings yet
Ethical Analysis Rubric - Quality Analysis: Level of Quality Categories Low Average High
5 pages
Assignment of Management Principles and Practices - FT 101C
No ratings yet
Assignment of Management Principles and Practices - FT 101C
9 pages
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
27 pages
Research Chapter 3
No ratings yet
Research Chapter 3
3 pages
134-Full Text WITHOUT Contact Details-426-1-10-20200402
No ratings yet
134-Full Text WITHOUT Contact Details-426-1-10-20200402
3 pages
Given The Learning Materials and Activities of This Chapter, They Will Be Able To
No ratings yet
Given The Learning Materials and Activities of This Chapter, They Will Be Able To
14 pages
3Is-LEARNING ACTIVITY SHEET 1
No ratings yet
3Is-LEARNING ACTIVITY SHEET 1
12 pages
Literature Review On Population Change
100% (1)
Literature Review On Population Change
5 pages
Research Methodology
50% (2)
Research Methodology
27 pages
Non Drilling Exploration Cost
No ratings yet
Non Drilling Exploration Cost
2 pages
Allama Iqbal Open University, Islamabad: (Department of Business Administration)
No ratings yet
Allama Iqbal Open University, Islamabad: (Department of Business Administration)
9 pages
Assessment & Critique
No ratings yet
Assessment & Critique
6 pages
Forecasting
No ratings yet
Forecasting
21 pages
Importance of Hotel Property Maintenance Management System
No ratings yet
Importance of Hotel Property Maintenance Management System
5 pages
Gerstner 1997 - Meta-Analytic Review of Leader-Member Exchange The
No ratings yet
Gerstner 1997 - Meta-Analytic Review of Leader-Member Exchange The
19 pages

Descriptive Analytics - Univariate and Bivariate

Uploaded by

Descriptive Analytics - Univariate and Bivariate

Uploaded by

COE 102

• Can you give some examples of raw data? (Class Discussion)

• What do we need to do with data to have meaningful insights?

• We need to produce some analytics!

• What would be a better alternative?

• Can you give more examples?

When the attribute “height” is zero it means there is no height.

This all means that when we have data expressed on an absolute

• The absolute cumulative frequency is the number of occurrences less

• The relative cumulative frequency is the percentage of occurrences

• This is illustrated in the

• Represent time series, graphs of

• Histograms are characterized by

• Histogram is more informative

Graphical representation of the statistics

Location statistic Nominal Ordinal Quantitative

• The median or the mode

• Can the mean be used in ordinal

• This is strongly arguable but there are

• The Likert uses an ordered scale, e.g.,

© João Moreira - FEUP/UP 34

• Is scale independent: values always between

You might also like