0% found this document useful (0 votes)

11 views43 pages

Data Analysis

The document provides an overview of various graphical representations of data, including line graphs, bar graphs, histograms, pie charts, and scatter diagrams, highlighting their uses and advantages. It also discusses measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation), along with inferential and descriptive statistics. Additionally, it covers regression analysis and correlation, explaining their significance in understanding relationships between variables.

Uploaded by

engyinmoe634

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views43 pages

Data Analysis

Uploaded by

engyinmoe634

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Data Analysis

Graphical Presentation
Line graph
A line graph is a type of graph where the information or data is
plotted as some and then they are added to each other by a straight
line.
The line graph is normally used to represent the data that changes
over time.
Bar graph
A bar graph or chart is a way to represent data by
rectangular column or bar. The heights or length
of the bar is proportional to the values.
Histogram

A histogram is a graphical representation of data points organized into user-specified ranges.

Similar in appearance to a bar graph, the histogram condenses a data series into an easily interpreted
visual by taking many data points and grouping them into logical ranges or bins.
Pie chart
The other name of the pie chart is a circle graph. It is a circular chart where numerical information
represents as slices or in fractional form or percentage where the whole circle is 100%.
Stem and leaf plot
The stem and leaf plot is a way to represents quantitative data according to
frequency ranges or frequency distribution.
In the stem and leaf plot, each data is split into stem and leaf, which is 32 will be
split into 3 stems and 2 leaves.For example, The data set
Frequency table: Frequency means the number of occurrences of an event. A
frequency distribution table is a graph or chart which shows the frequency of
events. It is denoted as ‘f’.
Pictograph
Pictograph or Pictogram is the earliest way to represents data in a pictorial form or by
using symbols or images. And each image represents a particular number of things.

. According to the above-mentioned Pictograph, the number of Appels sold on Monday is 6x2=12
Scatter diagrams
Scatter diagram or scatter plot is a way of graphical representation by using two
variables. The plot shows the relationship between two variables. Below there is a data
table as well as a Scattergram as per the given data.

Ice cream sales vs Temperature

Temperature ºc (x) Ice cream sales (y)

14.2º $215
16.4º $325
11.9º $185
15.2º $332
18.5º $406
22.1º $522
19.4º $412
25.1º $614
Advantages and Disadvantages of Graphical representation of data
 It improves the way of analyzing and learning as the graphical representation makes
the data easy to understand.
 It can be used in almost all fields from mathematics to physics to psychology and so
on.
 It is easy to understand for its visual impacts.
 It shows the whole and huge data in an instance.
The main disadvantage of graphical representation of data is that it takes a lot of effort
as well as resources to find the most appropriate data and then represents it graphically.
Eg. An insurance company determines vehicle insurance premiums based on known risk
factors. If a person is considered a higher risk, their premiums will be higher. One
potential factor is the color of your car. The insurance company believes that people with
some color cars are more likely to get in accidents. To research this, they examine police
reports for recent total-loss collisions. The data is summarized in this table.

Frequency of Total-
Car Color
Loss Collisions
Blue 25
Green 52
Red 41
White 36
Black 39
Grey 23
Total 216
Frequency of Total- Percentage of Loss
Car Color Collisions
Loss Collisions
Blue 25 11
Green 52 24
Red 41 19
White 36 17
Black 39 18
Grey 23 11
Total 216 100
Pie Chart
Bar chart
2 Types of Statistics
 Descriptive stats are used to define statistical properties of the data like the mean,
variance, skewness, etc.
 Statistics used only to describe the sample or summarize information about
the sample ( Methods of organizing, summarizing and presenting data in an
informative way)
 Inferential stats are used to determine if there statistically significant relationships
between sets of data.
 Statistics used to make inferences or generations about the broader population
( Methods used to find out something about population based on a sample)
Differences between Descriptive statistics and inferential statistics
Exercises
1. The line graph is normally used to represent the data that changes over time.
2. The heights or length of the bar is proportional to the values.
3. Similar in appearance to a bar graph, the histogram condenses a data series into an easily
interpreted visual by taking many data points and grouping them into logical ranges or bins.
4. The stem and leaf plot is a way to represents quantitative data according to frequency ranges or
frequency distribution.
5. Pie chart is a circular chart where numerical information represents as slices or in fractional form or
percentage where the whole circle is 100%.
6. In the stem and leaf plot, each data is split into stem and leaf, which is 32 will be split into 3 stems
and 2 leaves.
7. Scatter diagram or scatter plot is a way of graphical representation by using two variables.
8. It can be used in almost all fields from mathematics to physics to psychology and so on.
9. Descriptive stats are used to define statistical properties of the data like the mean, variance,
skewness, etc.
10. Inferential stats are used to determine if there statistically significant relationships between sets of
data.
Data description

Measures of Central Tendency

Mean – the sum of the values, divided by the total number of values.
Median – the midpoint of the data array .
Mode – the value that occurs most often in a data set.
Properties and Uses of Central Tendency

The Mean
• The mean is found by using all the values of the data.
• The mean varies less than the median or mode when samples are taken from the
same population and all three measures are computed for these samples.
• The mean is used in computing , other statistics, such as the variance.
• The mean for the data set is unique and not necessarily one of the data values.
• The mean cannot be computed for the data in a group frequency distribution that has
an open-ended class.
• The mean is affected by extremely high or low values, called outliers, and may not be
the appropriate average to use in these situation.
The Median
• The median is used to find the center or middle value of a data set.
• The median is used when it is necessary to find out whether the data values fall
into the upper half or lower half of the distribution.
• The median is used for an open-ended distribution.
• The median is affected less than the mean by extremely high or extremely low
values.
The Mode
• The mode is used when the most typical case is desired.
• The mode is the easiest average to compute.
• The mode can be used when the data are nominal or categorical , such as
religious preference, gender.
Exercises
1.Which measure of central tendency includes the magnitude of scores?
(a) Mean (b) Mode (c) Median (d) Range
2. To calculate the median, all the items of a series have to be
arranged in a/an ___(c)_____.
(a) Descending order (b)Ascending order (c) Ascending or
descending order(d) None of the above.

3. Mode refers to the value within a series that occurs (a)____

number of times.
(a) Maximum (b) Minimum (c) Zero (d) Infinite

4. _(c)___ is not a measure of central tendency.

(a) Mode (b) Mean (c) Range (d) Median
5. The sum of deviations from the _(c)________ is always zero.
(a) Median (b) Mode (c) Mean (d) None of the above
6. The number of observations smaller than _(a)_______ is the
Measure of Variability
Range
Variance - the average of the squares of the distance each value is from the
mean.

Standard Deviation – the square root of the variance .

Uses of Variance and Standard Deviation
• As previously stated , variances and standard deviations can be used to determine the
spread of the data . If the variance or standard deviation is large, the data are more
dispersed. This information is useful in comparing two (or more) data sets to determine
which is more (most) variable.
• It measures if variance and standard deviation are used to determine the consistency of
a variable. For example, in the manufacture of fittings , such as nuts and bolts, the
variation in the diameter must be small , or else the parts will not fit together.
• The variance and standard deviation are used to determine the number of data values
that fall within a specified interval in a distribution.
• Finally , the variance and standard deviation are used quite often in inferential statistics.

Coefficient of Variation
• The standard deviation divided by the mean.
Coefficient of variation (CV) = CV = =
Exploratory Data Analysis
• Exploratory Data Analysis (EDA) is an analysis approach that identifies general patterns in the
data. These patterns include outliers and features of the data that might be unexpected. EDA is
an important first step in any data analysis.
• In exploratory data analysis (EDA) , data can be organized using a stem and leaf plot .
• The measure of central tendency used in EDA is a median.
• The measure of variation used in EDA is the interquartile range .
Interquartile range : IQR = 3 -
Challenge yourself with these true/false questions.

1. The purpose of inferential statistics is to simplify and organize the data from a study.
(True/False)
2. Frequency distributions are a subset of inferential statistics. (True/
False)
3. Summary statistics are a subset of descriptive statistics. (True/False
)
4. The range is a frequently used measure of central tendency. (True/
False)
5. The mode is a frequently used measure of central tendency. (True/
False)
6. The mean is the score at the 50th percentile. (True/False)
7. A sample is a subset of the people or objects in a population. (True/
False)
8. If all of the scores in a distribution are increased by exactly five
points, the range will increase by five points. (True/False)
9. The variance, unlike the range, uses all the scores in its
computation. (True/False)
10.In descriptive statistics, final results are shown in form of charts,
tables and graphs. (True/False)
2.Analytic statistics
- looking at association among 2 or more variables

Cross-tabulation, Regression , Correlation, ANOVA

Crosstabulation
A crosstab is a table that summarizes the relationship between two categorical variables.
Cross-tabulation is also known as simply “Crosstab

Cake Ice cream Donut Total

Female 4 3 6 13
Male 5 7 9 21
Total 9 10 15 34
What Is a Regression?
Regression is a statistical method used in finance, investing, and other
disciplines that attempts to determine the relationship between one
dependent variable (usually denoted by Y) and a series of other variables
(known as independent variables Xi).
Also called simple regression or ordinary least squares (OLS), linear regression is the
most common form of this technique. Linear regression establishes the linear relationship
between two variables based on a line of best fit. Linear regression is thus graphically
depicted using a straight line with the slope defining how the change in one variable
impacts a change in the other. The y-intercept of a linear regression relationship represents
the value of one variable when the value of the other is zero. Non-linear regression models
also exist, but are far more complex.
The two basic types of regression are simple linear regression and
multiple linear regression, although there are non-linear regression methods for more
complicated data and analysis. Simple linear regression uses one independent variable to
explain or predict the outcome of the dependent variable Y, while multiple linear
regression uses two or more independent variables to predict the outcome (while holding
all others constant).
Simple linear regression: Y = a + bX + e

Multiple linear regression: Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + e

Where:
Y = the dependent variable that you are trying to predict or explain

X = the explanatory (independent) variable(s) that you are using to predict or

associate with Y
a = the y-intercept.
b = (beta coefficient) is the slope of the explanatory variable(s)
e = the regression residual or error term
The linear regression line is Y = a + b X + e.

Formulas for the Expected Regression Line ý = a + bx

By Least Square Method ,

b=
a=
Where a is the intercept and b is the slope of the line.

E.g. School Administrator wondered whether the class size and grade achievement ( in percent) were
related . A random sample of classes revealed the following data. Find ý when x = 12

No. of students 15 10 8 20 18 6
Avg. Grade (%) 85 90 82 80 84 92
No of
Students(x) Average grade (%) (y) xy x2
15 85 1275 225
10 90 900 100
8 82 656 64
20 80 1600 400
18 84 1512 324
6 92 552 36
Total 77 513 6495 1149

b = = = - 0.55

a= =

If x = 12,
Correlation
Correlation is used to describe the linear relationship between two continuous
variables (e.g., height and weight). It measures the strength (qualitatively) and direction
of the linear relationship between two or more variables.
Correlation coefficient
The degree of association is measured by a correlation coefficient,
denoted by r. It is sometimes called Pearson’s correlation
coefficient after its originator and is a measure of linear association.
If a curved line is needed to express the relationship, other and
more complicated measures of the correlation must be used.
The correlation coefficient is measured on a scale that varies from

+ 1 through 0 to – 1. Complete correlation between two variables is

expressed by either + 1 or -1. When one variable increases as the
other increases the correlation is positive; when one decreases as
the other increases it is negative. Complete absence of correlation is
represented by 0.
Correlation coefficient (r)
The strength of the linear relationship between two variables.( -1 ≤ r ≤ 1)

Formula for the Linear Correlation Coefficient r :

Where n is the number of data points.

X = independent variable
Y = dependent variable
E.g. Construct a scatter plot for the data obtained in a study on the number of absences and the final
grades of seven randomly selected students from a statistics class. The data are shown here .

Student Number of Final Grade y

absences x (%)
A 6 82
B 2 86
C 15 43
D 9 74
E 12 58
F 5 90
G 8 78

Compute the value of the linear correlation coefficient for the data obtained in the study of the number
of absences and the final grade of the seven students in the statistics class.
Student Number of Final Grade y (%) xy x2 y2
absences x
A 6 82 492 36 6724
B 2 86 172 4 7396
C 15 43 645 225 1849
D 9 74 666 81 5476
E 12 58 696 144 3364
F 5 90 450 25 8100
G 8 78 624 64 6084
Total 57 511 3745 579 38993

= = = -0.94422
Correlation coefficient (r) = -0.94422
Therefore Number of absences and final grade
are negatively and strongly correlated.
ANOVA
Analysis of Variance (ANOVA) is a statistical formula
used to compare variances across the means (or
average) of different groups. A range of scenarios use it to
determine if there is any difference between the means of
different groups.
ANOVA stands for Analysis of Variance. It's a statistical test
that was developed by Ronald Fisher in 1918 and has been
in use ever since. Put simply, ANOVA tells you if there are
any statistical differences between the means of
three or more independent groups. One-way ANOVA is
the most basic form.

MODULE-11 Practical Research PDF
92% (13)
MODULE-11 Practical Research PDF
123 pages
Educ 201
No ratings yet
Educ 201
2 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Final Year Project: Asteroids Classification Using Machine Learning
No ratings yet
Final Year Project: Asteroids Classification Using Machine Learning
15 pages
Cameron & Trivedi - Solution Manual Cap. 4-5
0% (1)
Cameron & Trivedi - Solution Manual Cap. 4-5
12 pages
Math
No ratings yet
Math
13 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
DATA ANALYSIS-PART1
No ratings yet
DATA ANALYSIS-PART1
17 pages
B180 Expt 9 Sem II
No ratings yet
B180 Expt 9 Sem II
8 pages
BIOSTAT LESSON 2 - Descriptive Statistics
No ratings yet
BIOSTAT LESSON 2 - Descriptive Statistics
3 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
2.Descriptive statistics
No ratings yet
2.Descriptive statistics
53 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Math Project (Section A)
No ratings yet
Math Project (Section A)
10 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
STATS
No ratings yet
STATS
3 pages
Advance Statistics for Data Science and Data Analysis (2)
No ratings yet
Advance Statistics for Data Science and Data Analysis (2)
47 pages
Basic Statistics notes
No ratings yet
Basic Statistics notes
10 pages
Basic Statistical Descriptions of Data
No ratings yet
Basic Statistical Descriptions of Data
7 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
MS102
No ratings yet
MS102
9 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
2.data Description
No ratings yet
2.data Description
57 pages
Class1
No ratings yet
Class1
52 pages
Mmw Data Management
No ratings yet
Mmw Data Management
35 pages
Mathematics in The Modern World Midterm Reviewer
No ratings yet
Mathematics in The Modern World Midterm Reviewer
8 pages
Powerpoint Presentation On: "Frequency
100% (2)
Powerpoint Presentation On: "Frequency
36 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Unit 4 Quantitative Analysis and Interpretation
No ratings yet
Unit 4 Quantitative Analysis and Interpretation
10 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
3 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Inquiry Investigation and Immersion Mod 1
No ratings yet
Inquiry Investigation and Immersion Mod 1
14 pages
Statistics A Review
No ratings yet
Statistics A Review
47 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
MATM111
No ratings yet
MATM111
8 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
biostatics course
No ratings yet
biostatics course
29 pages
Unit 4
No ratings yet
Unit 4
152 pages
Guiang Mamow Paper 1 Statistical Terms
No ratings yet
Guiang Mamow Paper 1 Statistical Terms
5 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Data-managementmmw (1)
No ratings yet
Data-managementmmw (1)
26 pages
C4 Descriptive Statistics
No ratings yet
C4 Descriptive Statistics
34 pages
Business Statistics_KMBN104
No ratings yet
Business Statistics_KMBN104
25 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
50 pages
2 Research - 2ND QT - Week 1 - 10 14 2024
No ratings yet
2 Research - 2ND QT - Week 1 - 10 14 2024
13 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Week 8 Quantitative Data Analysis - Descriptive Statistics
No ratings yet
Week 8 Quantitative Data Analysis - Descriptive Statistics
59 pages
AL- I (Unit -I)
No ratings yet
AL- I (Unit -I)
19 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Questionnaire
No ratings yet
Questionnaire
4 pages
58 - Method Statements For Erection of Steel PDF
100% (1)
58 - Method Statements For Erection of Steel PDF
1 page
PDF Medical statistics a textbook for the health sciences 4th Edition Michael J. Campbell download
No ratings yet
PDF Medical statistics a textbook for the health sciences 4th Edition Michael J. Campbell download
67 pages
Literature Review Main Presentation PDF
No ratings yet
Literature Review Main Presentation PDF
45 pages
Innovation Culture
100% (1)
Innovation Culture
5 pages
Extension-Assessment...
No ratings yet
Extension-Assessment...
19 pages
A Study On Consumer Satisfaction of Heritage Milk
100% (1)
A Study On Consumer Satisfaction of Heritage Milk
27 pages
VRSSO2007PsycAssessment
No ratings yet
VRSSO2007PsycAssessment
13 pages
Wilson Et Al 2016
No ratings yet
Wilson Et Al 2016
19 pages
Meron Gebru
No ratings yet
Meron Gebru
107 pages
Video: Lesson 1.6 After Watching Answer Following Questions
No ratings yet
Video: Lesson 1.6 After Watching Answer Following Questions
4 pages
Employee Performance Appraisal Sample
No ratings yet
Employee Performance Appraisal Sample
6 pages
113-Article Text-348-1-10-20221231
No ratings yet
113-Article Text-348-1-10-20221231
14 pages
Adjustment of Survey Errors by Lest Squres
No ratings yet
Adjustment of Survey Errors by Lest Squres
8 pages
My Project Report - Windshield Expert 1234
No ratings yet
My Project Report - Windshield Expert 1234
82 pages
Precision Consulting Dissertation Reviews
100% (2)
Precision Consulting Dissertation Reviews
6 pages
Fraud
No ratings yet
Fraud
12 pages
Personality Development of Girls Studying in NPEGEL and Non-NPEGEL Schools
No ratings yet
Personality Development of Girls Studying in NPEGEL and Non-NPEGEL Schools
7 pages
Research Inventy: International Journal of Engineering and Science
No ratings yet
Research Inventy: International Journal of Engineering and Science
6 pages
Psychology Term Paper Sample
50% (2)
Psychology Term Paper Sample
6 pages
5 LQ1Quiet Quitting During COVID-19 the Role of Psychological Empowerment
No ratings yet
5 LQ1Quiet Quitting During COVID-19 the Role of Psychological Empowerment
16 pages
Assessment 2 Discription
No ratings yet
Assessment 2 Discription
18 pages
Group 6 (Recovered)
No ratings yet
Group 6 (Recovered)
21 pages
Bias
No ratings yet
Bias
4 pages
Cocoa Full Instructions 240712 152111
No ratings yet
Cocoa Full Instructions 240712 152111
3 pages
Patient Satisfaction and Loyalty To The Healthcare Organization
No ratings yet
Patient Satisfaction and Loyalty To The Healthcare Organization
20 pages
4.1 Notes2
No ratings yet
4.1 Notes2
12 pages

Data Analysis

Uploaded by

Data Analysis

Uploaded by

Data Analysis

A histogram is a graphical representation of data points organized into user-specified ranges.

Ice cream sales vs Temperature

Temperature ºc (x) Ice cream sales (y)

Measures of Central Tendency

3. Mode refers to the value within a series that occurs __(a)______

4. ___(c)_____ is not a measure of central tendency.

Standard Deviation – the square root of the variance .

Cross-tabulation, Regression , Correlation, ANOVA

Cake Ice cream Donut Total

Multiple linear regression: Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + e

X = the explanatory (independent) variable(s) that you are using to predict or

Formulas for the Expected Regression Line ý = a + bx

By Least Square Method ,

+ 1 through 0 to – 1. Complete correlation between two variables is

Formula for the Linear Correlation Coefficient r :

Where n is the number of data points.

Student Number of Final Grade y

You might also like

3. Mode refers to the value within a series that occurs (a)____

4. _(c)___ is not a measure of central tendency.