0% found this document useful (0 votes)

38 views

Descriptive Analytics

The document discusses descriptive analytics and related concepts. Descriptive analytics involves summarizing and analyzing past data using simple queries to understand what has happened. It also discusses structured and unstructured data, different data types and scales, measures of central tendency and variation, data visualization techniques, and sampling.

Uploaded by

Vishnu P

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Descriptive Analytics

Uploaded by

Vishnu P

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Descriptive Analytics

Dr. Nikhil Ghag

SIOM Nashik
What is Descriptive Analytics

Descriptive analytics is about finding “what has happened” by

summarizing the data using innovative methods and analyzing the past
data using simple queries.

Example – Walmart, Cholera outbreak in London John Snow

Descriptive analytics is the starting point of analytics-based solution to

problems. It helps to understand the data and provide directions for
predictive and prescriptive analytics. Business Intelligence (BI), which
largely involves creating reports and business dashboard that led to John Snow’s spot map of cholera outbreak in London, 1854.

actionable insights, is essentially a descriptive analytics exercise.

Structured and Unstructured Data

• Structured data means that the data is described

in a matrix form with labelled rows and
columns.
• Machine generated data such as images
generated by satellite, magnetic resonance
imaging (MRI), electrocardiogram (ECG) and
thermography are few examples of unstructured
data.
• The importance of unstructured data in decision
making has increased

https://en.wikipedia.org/wiki/Clickstream
http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/
http://searchcrm.techtarget.com/definition/clickstream-analysis
https://www.qubole.com/blog/big-data/clickstream-data-analysis/
Cross-sectional, Time Series, and Panel Data

1. Cross-Sectional Data: A data collected on many variables of interest at the same time or duration
of time is called cross-sectional data. For example, consider data on movies such as budget, box-
office collection, actors, directors, genre of the movie during year 2017.

2. Time Series Data: A data collected for a single variable such as demand for smartphones
collected over several time intervals (weekly, monthly, etc.) is called a time series data.

3. Panel Data: Data collected on several variables (multiple dimensions) over several time intervals
is called panel data (also known as longitudinal data). Example of a panel data is data collected on
variables such as gross domestic product (GDP), Gini index, and unemployment rate for several
countries over several years.
TYPES OF DATA MEASUREMENT SCALES
Nominal Scale (Qualitative Data)

Nominal scale refers to variables that are basically names (qualitative data) and also known as categorical variables. For example,
variables such as marital status (single, married, divorced) and industry type (manufacturing, healthcare, banking and finance) fall
under nominal scale

Ordinal Scale

Ordinal scale is a variable in which the value of the data is captured from an ordered set, which is recorded in the order of magnitude.
For example, in many survey data, Likert scale is used.

Interval Scale

Interval scale corresponds to a variable in which the value is chosen from an interval set. Variable such as temperature measured in
centigrade (°C) or intelligence quotient (IQ) score are examples of interval scale.

Ratio Scale

Any variable for which the ratios can be computed and are meaningful is called ratio scale
POPULATION AND SAMPLE

• Population (also known as universal set) is the set of all possible data for a given
context whereas sample is the subset taken from a population

• In many analytical problems, we make inference about the population based on the
sample data. There are many challenges in sampling (process of selecting an
observation from the population).

• An incorrect sample may result in bias and incorrect inference about the population.
MEASURES OF CENTRAL TENDENCY
Mean (or Average) Value

• Mean is the arithmetical average value of the data and is one of the most frequently used measures of central tendency. Associated with the mean is a
phenomenon often called “wisdom of crowd”, according to which the collective wisdom of people is better than any individual person’s knowledge.

• Making decisions solely based on mean value is not advisable. In capital asset procurement such as procurement of fighter aircraft and weapons, defense
services across the world use mean time between failures (MTBF) as one of the measures of system reliability (performance).

Median (or Mid) Value

• Median is the value that divides the data into two equal parts, that is, the proportion of observations below median and above median will be 50%.

Mode

• Mode is the most frequently occurring value in the data set

PERCENTILE, DECILE, AND QUARTILE

• Decile corresponds to special values of percentile that divide the data into 10 equal parts. First decile contains first 10% of the data and second decile
contains first 20% of the data and so on.

• Quartile divides the data into 4 equal parts. The first quartile (Q1) contains first 25% of the data, Q2 contains 50% of the data and is also the median
Problem

Time between failures (in hours) of a wire cutter used in a cookie manufacturing oven is given in Table
2.4. The function of the wire-cut is to cut the dough into cookies of desired size.
(a) Calculate the mean, median, and mode of time between failures of wire-cuts.
(b) The company would like to know by what time 10% (ten percentile or P10) and 90% (ninety percentile
or P90) of the wire-cuts will fail?
(c) Calculate the values of P25 and P75.
Solution
MEASURES OF VARIATION

1. Range - Range is the difference between maximum and minimum value of the data. It captures the data spread. In

the data in Table 2.4, the range = 102 – 2 = 100.

2. Inter-Quartile Distance (IQD) - Inter-quartile distance (IQD), also called inter-quartile range (IQR), is a measure of

the distance between Quartile 1 (Q1) and Quartile 3 (Q3).

3. Variance - Variance is a measure of variability in the data from the mean value. Variance for population, s 2, is

calculated using

4. Standard Deviation
MEASURES OF SHAPE - SKEWNESS AND KURTOSIS

• Skewness is a measure of symmetry or lack of symmetry. A data set is symmetrical when the proportion of data
at equal distance (measured in terms of standard deviation) from mean (or median) is equal

• Pearson’s moment coefficient of skewness

• Kurtosis
DATA VISUALIZATION

Histogram

Histogram is the visual representation of the data which can be used to assess the probability distribution
(frequency distribution) of the data.

Histogram is very useful since it assists data scientist to identify the following:

1. The shape of the distribution and to assess the probability distribution of the data.

2. Measures of central tendency such as median and mode.

3. Measures of variability such as spread.

4. Measure of shape such as skewness.

The cumulative histograms are called Ogive curves

DATA VISUALIZATION

Bar Chart

Bar chart is a frequency chart for qualitative variable (or

categorical variable). Histograms cannot be used when the
variable is qualitative.

Pie Chart

Pie chart is mainly used for categorical data and is a circular

chart that displays the proportion of each category in the data set.
DATA VISUALIZATION
Scatter Plot

Scatter plot is a plot of two variables that will assist data scientists to
understand if there is any relationship between two variables. The relationship
could be linear or non-linear

Box Plot (or Box and Whisker Plot)

Box plot (aka Box and Whisker plot) is a graphical representation of

numerical data that can be used to

understand the variability of the data and the existence of outliers. Box plot is
designed by identifying the following descriptive statistics:

1. Lower quartile (1st Quartile), median and upper quartile (3rd Quartile).

2. Lowest and highest value.

3. Inter-quartile range (IQR).

Population and Sampling

The process of identifying a subset from a population of elements (aka observations or cases) is called
sampling process or simply sampling

Identification of target population that is important for a given problem under study.

Decide the sampling frame.

Determine the sample size

Sampling method

PT6A Small Customer Training Optimizado
100% (3)
PT6A Small Customer Training Optimizado
221 pages
Glider Report Team Anil Vu
No ratings yet
Glider Report Team Anil Vu
55 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
E-Books The Encyclopedia of Free Energy Vol (1) .3
90% (10)
E-Books The Encyclopedia of Free Energy Vol (1) .3
834 pages
(How To Write A (Lisp) Interpreter (In Python) )
No ratings yet
(How To Write A (Lisp) Interpreter (In Python) )
14 pages
2 Descriptive Analytics
No ratings yet
2 Descriptive Analytics
32 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Chapter 2 - Stat
No ratings yet
Chapter 2 - Stat
100 pages
Basic of Statistics #5 (!!!)
No ratings yet
Basic of Statistics #5 (!!!)
49 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
Stat Quick Overview
No ratings yet
Stat Quick Overview
35 pages
01 Data
No ratings yet
01 Data
100 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Class1
No ratings yet
Class1
52 pages
STATS
No ratings yet
STATS
3 pages
Intro To Stat1
No ratings yet
Intro To Stat1
31 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Kinds & Classification of Research: Reported By: Marina G. Servan
No ratings yet
Kinds & Classification of Research: Reported By: Marina G. Servan
52 pages
ge8 statistics
No ratings yet
ge8 statistics
2 pages
Data-managementmmw (1)
No ratings yet
Data-managementmmw (1)
26 pages
Basic Stat 1
No ratings yet
Basic Stat 1
50 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
statistics
No ratings yet
statistics
10 pages
Intro to Stat
No ratings yet
Intro to Stat
50 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Unit 4
No ratings yet
Unit 4
152 pages
Data Analysis and Visualization 2
No ratings yet
Data Analysis and Visualization 2
75 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Lecture Afffasfafa
No ratings yet
Lecture Afffasfafa
29 pages
DM 02 01 Data Undrestanding
No ratings yet
DM 02 01 Data Undrestanding
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Week 5 - Result and Analysis 1 (UP)
No ratings yet
Week 5 - Result and Analysis 1 (UP)
7 pages
CH1 and CH2 Definitions and Descriptive Statistics
No ratings yet
CH1 and CH2 Definitions and Descriptive Statistics
29 pages
Data Management
No ratings yet
Data Management
36 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Statistics Introduction
No ratings yet
Statistics Introduction
37 pages
BI Statistics Glossary
No ratings yet
BI Statistics Glossary
5 pages
2 Research - 2ND QT - Week 1 - 10 14 2024
No ratings yet
2 Research - 2ND QT - Week 1 - 10 14 2024
13 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
01 Data & Statistics
No ratings yet
01 Data & Statistics
35 pages
Basic Statistical Descriptions of Data
No ratings yet
Basic Statistical Descriptions of Data
7 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
QM 1
No ratings yet
QM 1
58 pages
Statistics For Bussiness: By: Dr. (C) Nanik Istianingsih, S.E., M.E., C.LMA., C.PR., C.DM
No ratings yet
Statistics For Bussiness: By: Dr. (C) Nanik Istianingsih, S.E., M.E., C.LMA., C.PR., C.DM
31 pages
Basic Statistics notes
No ratings yet
Basic Statistics notes
10 pages
ds1 Iat Ans
No ratings yet
ds1 Iat Ans
18 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Ref LSSBB Decision Analytics 15012024153903
No ratings yet
Ref LSSBB Decision Analytics 15012024153903
509 pages
Forcasting
No ratings yet
Forcasting
20 pages
Clustering
No ratings yet
Clustering
8 pages
The New BMW Business Model Innovation Transforms A
No ratings yet
The New BMW Business Model Innovation Transforms A
11 pages
Case Study - Banana Wars
No ratings yet
Case Study - Banana Wars
4 pages
DesignBuilder Simulation Training Slides
No ratings yet
DesignBuilder Simulation Training Slides
27 pages
Cambridge IGCSE™: Additional Mathematics 0606/13
No ratings yet
Cambridge IGCSE™: Additional Mathematics 0606/13
9 pages
Computer Class VII- Ch.1
No ratings yet
Computer Class VII- Ch.1
6 pages
QSK60 Spec Sheet
100% (1)
QSK60 Spec Sheet
6 pages
IMG - 0522 EE PreBoard Exam 3
No ratings yet
IMG - 0522 EE PreBoard Exam 3
1 page
Sintesis Diazepam
100% (1)
Sintesis Diazepam
12 pages
Enhancing IMG
No ratings yet
Enhancing IMG
14 pages
Moloboco RSH 632 Infographic 2
No ratings yet
Moloboco RSH 632 Infographic 2
1 page
Presentation On Intstrumentation & Control Included Topics:-Deflection & Null Type Instruments
No ratings yet
Presentation On Intstrumentation & Control Included Topics:-Deflection & Null Type Instruments
18 pages
Algorithm Complexity
No ratings yet
Algorithm Complexity
35 pages
The 400m Hurdles
No ratings yet
The 400m Hurdles
16 pages
ARM 9 WEB Server
No ratings yet
ARM 9 WEB Server
36 pages
Chapter 2-DATABASE SYSTEM Architecture
No ratings yet
Chapter 2-DATABASE SYSTEM Architecture
52 pages
SINAMICS G120 Standard Inverters: 0.37 KW To 250 KW (0.5 HP To 400 HP)
No ratings yet
SINAMICS G120 Standard Inverters: 0.37 KW To 250 KW (0.5 HP To 400 HP)
2 pages
Capstone Concorde SST
No ratings yet
Capstone Concorde SST
20 pages
WPS 141-111 P460QH To Same
No ratings yet
WPS 141-111 P460QH To Same
1 page
Trunnion Lifting
100% (2)
Trunnion Lifting
3 pages
C Bitwise Operators
No ratings yet
C Bitwise Operators
24 pages
EXp6 Motion Dynamics new
No ratings yet
EXp6 Motion Dynamics new
5 pages
Notice 1730530761
No ratings yet
Notice 1730530761
10 pages
Engineering Drawing and Graphics Basant Agrawal C M Agrawal Multiple Choice Questions
0% (1)
Engineering Drawing and Graphics Basant Agrawal C M Agrawal Multiple Choice Questions
3 pages
Calculus 1: AREA AND VOLUME
No ratings yet
Calculus 1: AREA AND VOLUME
37 pages
1.6b Isotropic and Anisotropic Minerals
No ratings yet
1.6b Isotropic and Anisotropic Minerals
44 pages
Common Types of PSLE Math Questions
100% (1)
Common Types of PSLE Math Questions
24 pages
Aerodynamics of Wind Turbines: Part - 3
No ratings yet
Aerodynamics of Wind Turbines: Part - 3
37 pages
LT Motor Datasheet
100% (1)
LT Motor Datasheet
4 pages