0% found this document useful (0 votes)

11 views

PDS_Unit4

The document provides an overview of statistics, including its definition, types (descriptive and inferential), and key concepts such as measures of central tendency and dispersion. It also discusses data visualization techniques, emphasizing the importance of effectively presenting data through various graphical methods like histograms, bar charts, and scatter plots. Additionally, it highlights the significance of data interpretation in research and decision-making processes.

Uploaded by

Ankitha T C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

PDS_Unit4

Uploaded by

Ankitha T C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Statistics: Introduction, Types of Statistics.

Data Visualization and

UNIT 4

Interpretation: Histogram, Bar Charts, Scatter Plots, Good vs. Bad

Visualization.
Sampling distributions; Point estimation - estimators, minimum variance
unbiased estimation, maximum likelihood estimation, method of
moments, consistency; Interval estimation.

Introduction to Statistics
Any raw Data, when collected and organized in the form of numerical or
tables, is known as Statistics. Statistics is also the mathematical study of
the probability of events occurring based on known quantitative Data or a
Collection of Data.
Statistics attempts to infer the properties of a large Collection of Data
from inspection of a sample of the Collection thereby allowing educated
guesses to be made with a minimum of expense. There are generally 3
kinds of averages commonly used in Statistics. They are: (i) Mean, (ii)
Median, and (iii) Mode.
Statistics is the study of Data Collection, Analysis, Interpretation,
Presentation, and organizing in a specific way. Mathematical methods
used for different analytics include mathematical Analysis, linear algebra,
stochastic Analysis, the theory of measure-theoretical probability, and
differential equations. Collecting, classifying, organizing, and displaying
numerical Data is associated with Statistics. This helps one to grasp
different outcomes from it and foresee several possibilities of various
events. Statistics discuss information, observations, and Data in the form
of numerical Data. We are able to find different indicators of central
tendencies and the divergence of various values from the center with the
help of Statistics.

The ability to analyze and interpret statistical Data is a vital skill for
researchers and professionals from a wide variety of disciplines. You may
need to make decisions on the basis of statistical Data, interpret
statistical Data in research papers, do your own research, and interpret
the Data.

There are two kinds of Statistics, which are descriptive Statistics and
Types of Statistics

inferential Statistics. In descriptive Statistics, the Data or Collection Data

are described in a summarized way, whereas in inferential Statistics, we
make use of it in order to explain the descriptive kind. Both of them are
used on a large scale. Also, there is another kind of Statistics where
descriptive transitions into inferential Statistics.
Statistics is mainly divided into the following two categories.
Descriptive Statistics
Inferential Statistics

In the descriptive Statistics, the Data is described in a summarized way.

Descriptive Statistics

The summarization is done from the sample of the population using

different parameters like Mean or standard deviation. Descriptive
Statistics are a way of using charts, graphs, and summary measures to
organize, represent, and explain a set of Data.
It is used to quantitatively describe the attributes of the known data and
provides summaries of either the sample or the population. The measures
of descriptive statistics are given as follows:
Measures of central tendency: These measures are used to describe data
with respect to a single central point. Mean, median, and mode are three
types of central tendency.
Measures of dispersion: These measures are used to describe the
variability of data. In other words, it is used to quantify the spread of a
distribution about a central value. Range, variance, standard
deviation, mean deviation, quartile deviation, and coefficients of
dispersion are the types that fall under this category.

In the Inferential Statistics, we try to interpret the Meaning of

Inferential Statistics

descriptive Statistics. After the Data has been collected, analyzed, and
summarised we use Inferential Statistics to describe the Meaning of the
collected Data.

Inferential Statistics use the probability principle to assess whether

trends contained in the research sample can be generalized to the


larger population from which the sample originally comes.

Inferential Statistics are intended to test hypotheses and
investigate relationships between variables and can be used to


make population predictions.

Inferential Statistics are used to draw conclusions and inferences,
i.e., to make valid generalizations from samples.


The measures of inferential statistics are given below:

Hypothesis testing - It is used to test some assumptions and make

inferences about the population parameters by using an estimate of the
sample. There are many types of statistical tests used for this purpose.
Some of them are the z test, t test, f test, and ANOVA test.

Regression Analysis - This type of analysis is used when the effect of

change in one variable causing a change in another variable needs to be
evaluated and quantified. Simple linear, multiple linear, nominal, logistic,
and ordinal regression are the types of regression analysis.

Example
In a class, the Data is the set of marks obtained by 50 students. Now
when we take out the Data average, the result is the average of 50
students’ marks. If the average marks obtained by 50 students are 88 out
of 100, on the basis of the outcome, we will draw a conclusion.

Mean: Mean is considered the arithmetic average of a Data set that is

Mean, Median and Mode in Statistics

found by adding the numbers in a set and dividing by the number of

observations in the Data set.

Median: The middle number in the Data set while listed in either
ascending or descending order is the Median.

Mode: The number that occurs the most in a Data set and ranges
between the highest and lowest value is the Mode.

The measures of central tendency do not suffice to describe the complete

Measures of Dispersion in Statistics

information about a given Data. Therefore, the variability is described by

a value called the measure of dispersion.

1. The range in Statistics is calculated as the difference between the

The different measures of dispersion include:

maximum value and the minimum value of the Data points.

2. The quartile deviation that measures the absolute measure of
dispersion. The Data points are divided into 3 quarters. Find the
Median of the Data points. The Median of the Data points to the left
of this Median is said to be the upper quartile and the Median of
the Data points to the right of this Median is said to be the lower
quartile. Upper quartile - lower quartile is the interquartile range.
Half of this is the quartile deviation.
3. The Mean deviation is the statistical measure to determine the
average of the absolute difference between the items in a
distribution and the Mean or Median of that series.
4. The standard deviation is the measure of the amount of variation of
a set of values.

Types of Data in Statistics

There are two types of data in statistics, namely, qualitative data and
quantitative data. In data analysis, it is necessary to apply the correct
testing technique which can only be done by sorting the data into various
types. Given below are the different types of data in statistics.
1. Qualitative Data or Categorical Data
 Nominal data - This type of data can be divided into two mutually
exclusive groups that do not overlap. Labels and tags are used to
categorize such data. Nominal data does not have any intrinsic
ordering and does not possess any numeric properly. Examples of
nominal data are gender, eye color, etc.
 Ordinal data - Similar to nominal data, arithmetic, and logical
operations cannot be performed on ordinal data as it does not possess
any numerical property. However, such a type of data can be
intrinsically ordered. For example, rating a restaurant experience on a
scale of 1 - 5.

Quantitative Data

Discrete data - This is one of the types of data that can only involve
the use of integers and cannot be divided into smaller or finer parts.


Mathematical operations can be performed on discrete data. For

example, days of a month, number of teachers in a school, etc.
Continuous data - Such a type of data can be divided into finer levels
and can take on any numeric value. There are two types of continuous


data, namely, interval and ratio data. Interval data can be negative
and does not have a meaningful zero. On the other hand, ratio data
can never be negative and has a meaningful zero. Calculations for
continuous data are performed using descriptive statistics.

Types of Variables in Statistics

Just as there are two types of data, similarly, there are two types of
variables in statistics. These variables are used to represent the
corresponding data. In both types of statistics, it is necessary to choose
the right kind of variable so as to administer the appropriate statistical
test. Given below are the different types of variables in statistics:

Qualitative Variables
Binary or dichotomous variables - Such a type of variable can
represent nominal data with only two levels of outcomes and does not


possess any intrinsic ordering. For example, passing or failing in an

examination.
Nominal variables - Nominal variables are used to represent data that
does not have any rank and cannot be ordered intrinsically. In other


words, it is used to represent nominal data. For example, breeds of

dogs.
Ordinal variables - This type of variable is used to represent ordinal
data wherein the groups can be ranked in a specific order. An example


is the finishing rank of people in a race.

Quantitative Variables

Discrete variables - Discrete variables represent the counts of unique

items or values. Such variables are also known as integer variables.


The different types of flowers in a garden can be represented using

discrete variables.
Continuous variables - These variables, also known as ratio variables,
are used to denote continuous data. It is used to represent non-finite


and continuous values. For example, the volume of a sphere.

Data Visualization and Interpretation

Data visualization is the representation of data through use of common
graphics, such as charts, plots, infographics, and even animations. These
visual displays of information communicate complex data relationships
and data-driven insights in a way that is easy to understand.
Data visualization can be utilized for a variety of purposes, and it’s
important to note that is not only reserved for use by data teams.
Management also leverages it to convey organizational structure and
hierarchy while data analysts and data scientists use it to discover and
explain patterns and trends.
Data visualization is commonly used to spur idea generation across
teams. They are frequently leveraged during brainstorming or Design
Thinking sessions at the start of a project by supporting the collection of
different perspectives and highlighting the common concerns of the
collective. While these visualizations are usually unpolished and
unrefined, they help set the foundation within the project to ensure that
the team is aligned on the problem that they’re looking to address for key
stakeholders.
Data visualization is a critical step in the data science process, helping
teams and individuals convey data more effectively to colleagues and
decision makers. Teams that manage reporting systems typically
leverage defined template views to monitor performance. However, data
visualization isn’t limited to performance dashboards. For example,
while text mining an analyst may use a word cloud to to capture key
concepts, trends, and hidden relationships within this unstructured data.
Alternatively, they may utilize a graph structure to illustrate relationships
between entities in a knowledge graph. There are a number of ways to
represent different types of data, and it’s important to remember that it
is a skillset that should extend beyond your core analytics team.
Data interpretation is the process of reviewing data through well-defined
methods. They help assign meaning to the data and arrive at a relevant
conclusion. The analysis is the process of ordering, categorizing, and
summarizing data to answer research questions. It should be done
quickly and effectively. The results need to stand out and should be right
in your face. Data Plot types for Visualization is an important aspect of
this end. With growing data, this need is growing and hence data plots
become very important in today’s world. However, there are many types
of plots used in data visualization. It is often tricky to choose which type
is best for your business or data. Each of these plots has its strengths and
weaknesses that make it better than others in some situations.

These data plot types for visualization are sometimes called graphs or
charts

Benefits of good data visualization

Our eyes are drawn to colours and patterns. We can quickly identify red
from blue, and square from the circle. Our culture is visual, including
everything from art and advertisements to TV and movies.

Data visualization is another form of visual art that grabs our interest and
keeps our eyes on the message. When we see a chart, we quickly see
trends and outliers. If we can see something, we internalize it quickly.
It’s storytelling with a purpose. If you’ve ever stared at a massive
spreadsheet of data and couldn’t see a trend, you know how much more
effective a visualization can be. The uses of Data Visualization as follows.

Powerful way to explore data with presentable results.

Primary use is the pre-processing portion of the data mining


process.


Supports the data cleaning process by finding incorrect and

missing values.


For variable derivation and selection means to determine which

variable to include and discarded in the analysis.


Also play a role in combining categories as part of the data

reduction process.

Data Visualization Techniques

Box plots
Histograms


Heat maps


Charts


Tree maps


Word Cloud/Network diagram




Box Plots

The image above is a box plot. A boxplot is a standardized way of

displaying the distribution of data based on a five-number summary


(“minimum”, first quartile (Q1), median, third quartile (Q3), and

“maximum”). It can tell you about your outliers and what their
values are. It can also tell you if your data is symmetrical, how
tightly your data is grouped, and if and how your data is skewed.
A box plot is a graph that gives you a good indication of how the
values in the data are spread out. Although box plots may seem


primitive in comparison to a histogram or density plot, they have

the advantage of taking up less space, which is useful when
comparing distributions between many groups or datasets. For
some distributions/datasets, you will find that you need more
information than the measures of central tendency (median, mean,
and mode). You need to have information on the variability or
dispersion of the data.

List of Methods to Visualize Data

Column Chart: It is also called a vertical bar chart where each

category is represented by a rectangle. The height of the rectangle


is proportional to the values that are plotted.

Bar Graph: It has rectangular bars in which the lengths are
proportional to the values which are represented.


Stacked Bar Graph: It is a bar style graph that has various

components stacked together so that apart from the bar, the


components can also be compared to each other.

Stacked Column Chart: It is similar to a stacked bar; however, the
data is stacked horizontally.


Area Chart: It combines the line chart and bar chart to show how
the numeric values of one or more groups change over the progress


of a viable area.
Dual Axis Chart: It combines a column chart and a line chart and
then compares the two variables.

Line Graph: The data points are connected through a straight line;
therefore, creating a representation of the changing trend.


Mekko Chart: It can be called a two-dimensional stacked chart with

varying column widths.


Pie Chart: It is a chart where various components of a data set are

presented in the form of a pie which represents their proportion in


the entire data set.

Waterfall Chart: With the help of this chart, the increasing effect of
sequentially introduced positive or negative values can be


understood.
Bubble Chart: It is a multi-variable graph that is a hybrid of Scatter
Plot and a Proportional Area Chart.


Scatter Plot Chart: It is also called a scatter chart or scatter graph.

Dots are used to denote values for two different numeric variables.


Bullet Graph: It is a variation of a bar graph. A bullet graph is used

to swap dashboard gauges and meters.


Funnel Chart: The chart determines the flow of users with the help
of a business or sales process.


Heat Map: It is a technique of data visualization that shows the

level of instances as color in two dimensions.


Histograms

A histogram is a graphical display of data using bars of different heights.

In a histogram, each bar groups numbers into ranges. Taller bars show
that more data falls in that range. A histogram displays the shape and
spread of continuous sample data.
It is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data. This allows the
inspection of the data for its underlying distribution (e.g., normal
distribution), outliers, skewness, etc. It is an accurate representation of
the distribution of numerical data, it relates only one variable. Includes
bin or bucket- the range of values that divide the entire range of values
into a series of intervals and then count how many values fall into each
interval.

Histograms are based on area, not height of bars

In a histogram, the height of the bar does not necessarily indicate how
many occurrences of scores there were within each bin. It is the product
of height multiplied by the width of the bin that indicates the frequency
of occurrences within that bin. One of the reasons that the height of the
bars is often incorrectly assessed as indicating the frequency and not the
area of the bar is because a lot of histograms often have equally spaced
bars (bins), and under these circumstances, the height of the bin does
reflect the frequency.
Heat Maps

A heat map is data analysis software that uses colour the way a bar graph
uses height and width: as a data visualization tool.
If you’re looking at a web page and you want to know which areas get the
most attention, a heat map shows you in a visual way that’s easy to
assimilate and make decisions from. It is a graphical representation of
data where the individual values contained in a matrix are represented as
colours. Useful for two purposes: for visualizing correlation tables and for
visualizing missing values in the data. In both cases, the information is
conveyed in a two-dimensional table.
Note that heat maps are useful when examining a large number of
values, but they are not a replacement for more precise graphical
displays, such as bar charts, because colour differences cannot be
perceived accurately.

Charts

The simplest technique, a line plot is used to plot the relationship or

Line Chart

dependence of one variable on another. To plot the relationship between

the two variables, we can simply call the plot function.

Bar charts are used for comparing the quantities of different categories
Bar Charts

or groups. Values of a category are represented with the help of bars and
they can be configured with vertical or horizontal bars, with the length or
height of each bar representing the value.
A bar graph plots data with the help of bars, which represent value on
the y-axis and category on the x-axis. Bar graphs use bars with varying
heights to show the data which belongs to a specific category.
We can also stack bars on top of each other. Let's plot the data for apples
and oranges.
We can change the number and size of bins using numpy too.

We can create bins of unequal size too.

Similar to line charts, we can draw multiple histograms in a single chart.
We can reduce each histogram's opacity so that one histogram's bars
don't hide the others'. Let's draw separate histograms for each species of
flowers

Pie Chart

It is a circular statistical graph which decides slices to illustrate

numerical proportion. Here the arc length of each slide is proportional to
the quantity it represents. As a rule, they are used to compare the parts
of a whole and are most effective when there are limited components and
when text and percentages are included to describe the content.
However, they can be difficult to interpret because the human eye has a
hard time estimating areas and comparing visual angles.

Scatter Charts

Another common visualization technique is a scatter plot that is a two-

dimensional plot representing the joint variation of two data items. Each
marker (symbols such as dots, squares and plus signs) represents an
observation. The marker position indicates the value for each
observation. When you assign more than two measures, a scatter plot
matrix is produced that is a series scatter plot displaying every possible
pairing of the measures that are assigned to the visualization. Scatter
plots are used for examining the relationship, or correlations, between X
and Y variables.

Scatter plots are used when we have to plot two or more variables
present at different coordinates. The data is scattered all over the graph
and is not confined to a range. Two or more variables are plotted in a
Scatter Plot, with each variable being represented by a different color.
Let's use the ‘Iris’ dataset to plot a Scatter Plot.

First, let’s see how many different species of flowers we have.

Figure 26: Unique flower species

Let’s try plotting the data with the help of a line chart.
This is not very informative. We cannot figure out the relationship
between different data points.

This is much better. But we still cannot differentiate different data points
belonging to different categories. We can color the dots using the flower
species as a hue.
Figure 29: Scatter plot with multiple colors

Since Seaborn uses Matplotlib's plotting functions internally, we can use

functions like plt.figure and plt.title to modify the figure.

Figure 30: Changing dimensions of scatter plot

Heatmaps are used to see changes in behavior or gradual changes in

Heat Maps

data. It uses different colors to represent different values. Based on how

these colors range in hues, intensity, etc., tells us how the phenomenon
varies. Let's use heatmaps to visualize monthly passenger footfall at an
airport over 12 years from the flights dataset in Seaborn.
Figure 31: Flights dataset
The above dataset, flights_df shows us the monthly footfall in an airport
for each year, from 1949 to 1960. The values represent the number of
passengers (in thousands) that passed through the airport. Let’s use a
heatmap to visualize the above data.

Figure 32: Plotting heatmap

The brighter the color, the higher the footfall at the airport. By looking at
the graph, we can infer that :
1. The annual footfall for any given year is highest around July and
August.

2. The footfall grows annually. Any month in a year will have a higher
footfall when compared to the previous years.

Let's display the actual values in our heatmap and change the hue to
blue.

Figure 33: Plotting heatmap with values

Bubble Charts

It is a variation of scatter chart in which the data points are replaced

with bubbles, and an additional dimension of data is represented in the
size of the bubbles.

Timeline Charts

Timeline charts illustrate events, in chronological order — for example

the progress of a project, advertising campaign, acquisition process — in
whatever unit of time the data was recorded — for example week, month,
year, quarter. It shows the chronological sequence of past or future
events on a timescale.
Tree Maps

A treemap is a visualization that displays hierarchically organized data as

a set of nested rectangles, parent elements being tiled with their child
elements. The sizes and colours of rectangles are proportional to the
values of the data points they represent. A leaf node rectangle has an
area proportional to the specified dimension of the data. Depending on
the choice, the leaf node is coloured, sized or both according to chosen
attributes. They make efficient use of space, thus display thousands of
items on the screen simultaneously.

Educ 201
No ratings yet
Educ 201
2 pages
Statistics
No ratings yet
Statistics
21 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Statistics
No ratings yet
Statistics
13 pages
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
72 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Statistics
No ratings yet
Statistics
88 pages
Statistics
No ratings yet
Statistics
11 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
Statistics
No ratings yet
Statistics
68 pages
Chapter 01
No ratings yet
Chapter 01
56 pages
Advanced Statistics1
No ratings yet
Advanced Statistics1
19 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Educational Statistics Notes
No ratings yet
Educational Statistics Notes
32 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Statistics Theory
No ratings yet
Statistics Theory
3 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Data Management
No ratings yet
Data Management
48 pages
2466939-EDA_and_STATISTICS_NOTES
No ratings yet
2466939-EDA_and_STATISTICS_NOTES
15 pages
Statistics
No ratings yet
Statistics
30 pages
Advance Statistics for Data Science and Data Analysis (2)
No ratings yet
Advance Statistics for Data Science and Data Analysis (2)
47 pages
Introduction To Statistics Lecture 7
No ratings yet
Introduction To Statistics Lecture 7
32 pages
Summarize Topic in Statistical
No ratings yet
Summarize Topic in Statistical
5 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Statistics_ Def-wps Office
No ratings yet
Statistics_ Def-wps Office
14 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Pyscho
No ratings yet
Pyscho
2 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
DIscussion Forum Answers
No ratings yet
DIscussion Forum Answers
3 pages
Business statistics and analysis All topics in one pdf Aktu notes
No ratings yet
Business statistics and analysis All topics in one pdf Aktu notes
42 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
SSM & Da All Unit Notes
No ratings yet
SSM & Da All Unit Notes
152 pages
Descriptive_Statistics
No ratings yet
Descriptive_Statistics
73 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Statistics_Compendium_DMS IIT DELHI_2025
No ratings yet
Statistics_Compendium_DMS IIT DELHI_2025
18 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Presentation ON Introduction To Statistics: Course No: URP 5151 Couse Title: Statistics For Planners
No ratings yet
Presentation ON Introduction To Statistics: Course No: URP 5151 Couse Title: Statistics For Planners
37 pages
Lecture Sheet For SPSS
100% (1)
Lecture Sheet For SPSS
29 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
Data Management
No ratings yet
Data Management
36 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
From Weakest To Strongest in Terms of Statistical Inference)
No ratings yet
From Weakest To Strongest in Terms of Statistical Inference)
1 page
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
43 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Statistical Analysis With Software Application Module - 8
No ratings yet
Statistical Analysis With Software Application Module - 8
4 pages
Your Answer Score Explanation
No ratings yet
Your Answer Score Explanation
20 pages
Spss 18 P 4
No ratings yet
Spss 18 P 4
21 pages
Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
No ratings yet
Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
39 pages
STATS
No ratings yet
STATS
26 pages
ssc201 Lecture Note-1
No ratings yet
ssc201 Lecture Note-1
62 pages
Analysis of Variance (Anova) : Wah Mong Weh Jabatan Matematik Ipg Ksah
No ratings yet
Analysis of Variance (Anova) : Wah Mong Weh Jabatan Matematik Ipg Ksah
22 pages
02-Sampling - Techniques Lab Report
50% (2)
02-Sampling - Techniques Lab Report
8 pages
3.2 Tests For Random Numbers: Two Types of Tests: 1. Frequency Test: U
No ratings yet
3.2 Tests For Random Numbers: Two Types of Tests: 1. Frequency Test: U
12 pages
Proposal Skripsi Perbedaan Pengaruh Latihan Pliometrik Double Leg Cone Hop Dengan Theraband Terhadap Daya Ledak Otot Tungkai
No ratings yet
Proposal Skripsi Perbedaan Pengaruh Latihan Pliometrik Double Leg Cone Hop Dengan Theraband Terhadap Daya Ledak Otot Tungkai
8 pages
ANOVA Calculator - One Way ANOVA and Tukey HSD test
No ratings yet
ANOVA Calculator - One Way ANOVA and Tukey HSD test
5 pages
Carrier Bags Usage and Attitudes Consumer Research in England
No ratings yet
Carrier Bags Usage and Attitudes Consumer Research in England
93 pages
Question And Answer Research Methods in Computer Science
No ratings yet
Question And Answer Research Methods in Computer Science
83 pages
Biological Science III
No ratings yet
Biological Science III
16 pages
Sample Problem With Answers On Hypothesis Testing
No ratings yet
Sample Problem With Answers On Hypothesis Testing
3 pages
JNTUH MBA Course Structure and Syllabus 2013
No ratings yet
JNTUH MBA Course Structure and Syllabus 2013
80 pages
MBA Data Analytics __ SAGE (2)
No ratings yet
MBA Data Analytics __ SAGE (2)
10 pages
Confidence Interval Confidence Interval: Error
No ratings yet
Confidence Interval Confidence Interval: Error
1 page
MIR - Science For Everyone - Khurgin Ya. I. - Yes, No or Maybe - 1985
100% (3)
MIR - Science For Everyone - Khurgin Ya. I. - Yes, No or Maybe - 1985
218 pages
MAT2379 - Assignment #4 Solutions
No ratings yet
MAT2379 - Assignment #4 Solutions
3 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
1 page
Robustness and Ruggedness in Analytical Method Validation
100% (2)
Robustness and Ruggedness in Analytical Method Validation
48 pages
Quantitative Techniques For Research: Ahmed Arif
No ratings yet
Quantitative Techniques For Research: Ahmed Arif
11 pages
Sample Size Determination
No ratings yet
Sample Size Determination
21 pages
Human Evaluation of Automatically Generated Text C
No ratings yet
Human Evaluation of Automatically Generated Text C
24 pages
A Research Presented To The
No ratings yet
A Research Presented To The
40 pages
07.Statistics Unit-VII
No ratings yet
07.Statistics Unit-VII
46 pages
CH 4 - Estimation & Hypothesis One Sample
No ratings yet
CH 4 - Estimation & Hypothesis One Sample
139 pages
STAT-502 Assignment 1 (Spring-2025)
No ratings yet
STAT-502 Assignment 1 (Spring-2025)
3 pages
NON-Parametric Statistical Inference
No ratings yet
NON-Parametric Statistical Inference
21 pages

PDS_Unit4

Uploaded by

PDS_Unit4

Uploaded by

Statistics: Introduction, Types of Statistics.

Data Visualization and

Interpretation: Histogram, Bar Charts, Scatter Plots, Good vs. Bad

inferential Statistics. In descriptive Statistics, the Data or Collection Data

In the descriptive Statistics, the Data is described in a summarized way.

The summarization is done from the sample of the population using

In the Inferential Statistics, we try to interpret the Meaning of

Inferential Statistics use the probability principle to assess whether

larger population from which the sample originally comes.

make population predictions.

The measures of inferential statistics are given below:

Hypothesis testing - It is used to test some assumptions and make

Regression Analysis - This type of analysis is used when the effect of

Mean: Mean is considered the arithmetic average of a Data set that is

found by adding the numbers in a set and dividing by the number of

The measures of central tendency do not suffice to describe the complete

information about a given Data. Therefore, the variability is described by

1. The range in Statistics is calculated as the difference between the

maximum value and the minimum value of the Data points.

Types of Data in Statistics

Mathematical operations can be performed on discrete data. For

Types of Variables in Statistics

possess any intrinsic ordering. For example, passing or failing in an

words, it is used to represent nominal data. For example, breeds of

is the finishing rank of people in a race.

Discrete variables - Discrete variables represent the counts of unique

The different types of flowers in a garden can be represented using

and continuous values. For example, the volume of a sphere.

Data Visualization and Interpretation

Benefits of good data visualization

Powerful way to explore data with presentable results.

Supports the data cleaning process by finding incorrect and

For variable derivation and selection means to determine which

Also play a role in combining categories as part of the data

Word Cloud/Network diagram

The image above is a box plot. A boxplot is a standardized way of

(“minimum”, first quartile (Q1), median, third quartile (Q3), and

primitive in comparison to a histogram or density plot, they have

List of Methods to Visualize Data

Column Chart: It is also called a vertical bar chart where each

is proportional to the values that are plotted.

Stacked Bar Graph: It is a bar style graph that has various

components can also be compared to each other.

Mekko Chart: It can be called a two-dimensional stacked chart with

Pie Chart: It is a chart where various components of a data set are

the entire data set.

Scatter Plot Chart: It is also called a scatter chart or scatter graph.

Bullet Graph: It is a variation of a bar graph. A bullet graph is used

Heat Map: It is a technique of data visualization that shows the

A histogram is a graphical display of data using bars of different heights.

Histograms are based on area, not height of bars

The simplest technique, a line plot is used to plot the relationship or

dependence of one variable on another. To plot the relationship between

We can create bins of unequal size too.

It is a circular statistical graph which decides slices to illustrate

Another common visualization technique is a scatter plot that is a two-

First, let’s see how many different species of flowers we have.

Figure 26: Unique flower species

Since Seaborn uses Matplotlib's plotting functions internally, we can use

Figure 30: Changing dimensions of scatter plot

Heatmaps are used to see changes in behavior or gradual changes in

data. It uses different colors to represent different values. Based on how

Figure 32: Plotting heatmap

Figure 33: Plotting heatmap with values

It is a variation of scatter chart in which the data points are replaced

Timeline charts illustrate events, in chronological order — for example

A treemap is a visualization that displays hierarchically organized data as

You might also like