Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
Descriptive Statistics
Summarizing and Describing a Single Set
of Observations
R. Lopez
MAC 2205
Statistics Applied Education
MRU
Module 1. Objectives. Review
• Objectives
• I..- Basic concepts.
• 1.1.-Population, Variable types, sample, distribution.
• 1.2.-Normal distribution, Binomial, Poisson and Chi Square.
• 2 Describing Data from a Research in Education.
• 2.1.- Different kind of data:
• 2.1.1.-Categorical Data
• 2.1.2.- Interval Data
• 3.- Statistics for:
• 3.1.-Central tendency: Mean, Mode and Median.
• 3.2- Dispersion: Standard Deviation, Variance, Standard Error, Confidence interval, Quartiles.
• 3.3.-Form of the distribution: Skewnnes and Kurtosis
• 4.- Graphs to represent Distributions:
• 4.1.-Histograms and Box Plots
• 4.2- Relative Position
• 5.- Stratification: Data Comparison
Descriptive Statistics and Inferential
Statistics
• Descriptive Statistics definition:
• Descriptive statistics has the only purpose to describe set of data,
calculating the statistics or representing the data graphically .
• Statistics for Central Tendency: Mean, Median, Mode
• Statistics for Spread or Variability: Variance, Standard deviation,
confidence interval, error, range, quartiles, percentiles.
• Statistics for symmetry: Skewness and Kurtosis.
• Graphics: Histograms, Boxplots, etc.
Inferential statistics
• With inferential statistics, we are trying to reach conclusions that
extend beyond the immediate data alone. For instance, we use
inferential statistics to try to infer from the sample data what the
population might think.
• We use inferential statistics to make inferences from our data to
more general conditions; we use descriptive statistics simply to
describe what's going on in our data.
• Inferential statistics are useful in experimental and quasi-experimental
research design or in program outcome evaluation. The simplest
inferential test is used when you want to compare the average
performance of two groups on a single measure to see if there is a
significance difference.
• An example: We might want to know whether eighth-grade boys and
girls differ in math test scores or whether a program group differs on
the outcome measure from a control group. We use the t-test to
determine significant differences between the averages of two
groups.
• Most of the major inferential statistics come from a general family of
statistical models known as the General Linear Model.
• This includes the t-test, Analysis of Variance (ANOVA), Analysis of
Covariance (ANCOVA), regression analysis, and many of the
multivariate methods like factor analysis, multidimensional scaling,
cluster analysis, discriminant function analysis, and so on.
Introduction: The first step to select the proper
statistical test should be to identify the variable
type of the data we have:
Categorical Data
• Categorical Data: ( Nominal and Ordinal)
• Nominal data is constituted by names or identifiers.
• For example Gender is a Nominal variable that can be identified by
letters: male or female
• or by a number associated with them: 0 ( male) 1 ( female)
• Ordinal data is used in order to compare subjects that are organized
following an order or ranking scheme.
• For example is we want to categorize the degree of mentally illness
patients we can not attach to each of them a value, but only an order,
with respect to: ‘less than” “ equal to” or “greater than”.
Categorical Data
• Nominal and ordinal data are often summarized with bar charts.
Interval
• Interval data are represented with
numbers but the differences between
values have a meaning..
• Ratio data are represented with
numbers and as in the interval, the
differences between values have a
meaning and can be measured
depending on the accuracy of the
instrument or the technique we use
to measure it.
• The difference with interval data is
that ratio data has a clearly
interpretable zero
Sources for data
• When searching for information on a topic, it is important to
understand the value of primary, secondary, and tertiary sources.
• Primary sources allow researchers to get as close as possible to
original ideas, events, and empirical research as possible. Such
sources may include creative works, first hand or contemporary
accounts of events, and the publication of the results of empirical
observations or research.
Introduction. Important definitions
• Descriptive Statistics: ( Summarization)
• A data set is a collection of facts and values.
• In the first part our purpose is to represent the data set usually by
means of histograms or bar charts, pies, distributions, etc.
• Besides the second purpose is to calculate the statistical parameters
that can represent them in a summarize way:
• Central Tendency
• Dispersion
• Shape of the distribution
Statistics for interval and ratio data.
Central Tendency.
Central Tendency:
Mean: For a data set the mean refers to the central
value : specifically, the sum of the values divided by
the number of values.
• X = ∑i=1 xi / n = ( x1 + x2 + x3 + x4 +……xn ) /n
• If we have the following individual set of single outcomes:
• Calculate
subject 1 the
2 average
3 or
4 mean 5 value:
6 (using
7 Excel)
8 9
values 103 108 95 110 109 103 92 98 105
X ± 2 * SE x
SEx = S/ ᴠ n
The 95% confidence interval:
• For the following data calculate the mean value, the variance, the
standard deviation, the error and the 95% interval assuming a Normal
distribution, for the lower blood pressure value, for females and
males
Subject 1 2 3 4 5 6 7 8 9
x 81 82 84 88 84 90 88 86 87
Females:
Total sum = 770 Average =85.56
Subject 1 2 3 4 5 6 7 8 9
x 87 90 93 91 90 89 92 90 88
Males
Total sum=810 Average = 90
The variance and the Standard
Deviation
S2 = 9.02778 S=3.004626
For Males
S2 =[(87-90)2 + (90 -90)2 + (93-90)2 + (91-90)2 +(90-90)2 + (89-90)2 +(92-90)2+(90-90)2 +(88-90)2 ]/(9-1) = 3.4999
S= 1.870829
Calculations by Excel
females males
81 87
82 90
84 93
88 91
84 90
90 89
88 92
86 90
87 88
Average Click on home 85.55556 90
Click on autosum
Select average
79 79
78
Then into Excel :
• High light the whole column.
• Then click on Insert ( on the top)
• It should appear different option and notice that there is one with
different graphs: you can click on recommended graphs to see is there
one that might be convenient. (but no).
• Look for the one that loos like 3 rectangules with the name
histograms copy it a post it into the power point presentation or any
word report that you like
histogram
histogram
• The histogram reports the same information as your data but divided
• In 3 classes. Those classes divided the so called
• range= highervalue –lower value= 90-75= 15
• In 3 classes ( the number of classes depends of how many values you
have, in this case we have n=14 values , approximately taking the
square root of n in this case 3 classes, then dividing the range by 3 we
obtained the so called interval of each class. The value in the middle
of each rectangle or class is called the mark of that class.
Box and whisker graph
• Go again to Excel but now click on the graph for Box and Whisker and
click on it:
Analyzing the Box and Whisker graph
• Lets explain the information that it provides:
• Determine the average and the standard Deviation as we did before:
• Average 81.6 Notice that is the value of the X sign in the Box and
whisker graph that it is at the same time with the value of the line in the
middle of the box ( the median)
• St.Dev.= 4.627319733 rounding off 4.6
• With the Standard Deviation we can calculate the 95% confidence interval
which are close to the values at the tip and end of the values given by the
whiskers:
• 95 % confidence interval 81.6 +/- 2 * 4.6 = 72.4 to 90.8.
• The upper line of the box is the upper quartile the value that has 25% of
the values over it and 75 % of the values below.The line at the bottom of
the box is the lower quartile the value that haw 75% of the values over it
and 25% of the values below it.
As a Summary, during the next presentation we’ll
study different types of statistical test.
• To compare two averages:
• Student t Test or the Zscore ( if the data is in accordance with the Normal
distribution)
• To compare variances or more than two averages:
• The ANOVA test o Fisher test. Or the Kruskal-Wallis test if the data doesn’t follow
the Normal Distribution ( non parametric test)
• To compare medians or ranks(for non parametric data)
• Mann Whitney test ( if the data doesn’t follow the Normal distribution)
• To compare proportions:
• Binomial distribution.
• To compare frequencies:
• Chi square test.
•
The scatter plot. Go to insert and then click
on recommended chart and select scatter
plot. All the values will appear as dots
connected by a broken line
Chart Title
95
90
85
80
75
70
65
0 2 4 6 8 10 12 14 16
Conclusion
• We only presented the typical statistical test that we are going to use
for the different types of cases
• We won’t request that the students calculate their results using SPSS
or manually because we’ll use Excel for those calculation.