Module 6 Lesson 1
Module 6 Lesson 1
Introduction to Statistics
Definition of terms:
Statistics is a branch mathematics that deals with the collection, organization or presentation,
analysis, and interpretation of data. Its fundamental purpose is to describe and draw inferences
about the numerical properties of a population.
Descriptive Statistics is a statistical procedure concerned with
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
1
Inferential statistics can answer questions such as:
1. Is there a significant difference in the academic performance of students enrolled in an online
and modular class?
2. Is there a significant difference between the proportions of students who are interested to take
statistics online and those who are not?
Population – refers to a large collection of objects, places or things.
Parameter – is any numerical value which describes a population.
Example: There are 7, 592 students enrolled in a certain Marian Institution.
N = 7, 592 Parameter (N)
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
2
Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order or
ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there exists
a true zero. in addition, true ratios exist when the same variable is measured on two different members
of the population. e.g. height
Sampling Techniques
In doing research, if the population is too big a scientific number of samples is acceptable. One way
of getting a number of samples is by using the RAOSOFT survey tool. You can use the raosoft calculator
tool online to compute the desired and substantial sample size. http://www.raosoft.com/samplesize.html
Note that the “e” is called the margin of error. It is a value which quantifies possible sampling errors.
Usually the margin of error is either 0.01 or 1%, 0.10 or 10 % and 0.05 or 5%. Sampling
error means that the results in the sample differ from those of the target population because
Sampling is the process of the “luck of the draw”.
of selecting samples from
a given population
Since you already know what to use to compute the appropriate sample size, the next
There are two types of is how to select the samples from the population. This is referred to as sampling.
sampling techniques
(1) Probability
Sampling: samples
We will only consider and discuss the probability sampling techniques, these are:
are chosen in such a
way that each Simple random sampling. This is a procedure where a sample is selected in such a way that
member of the every element is as likely to be selected as any other element in the population.
population has an
equal chance of
being selected in the Example: Lottery: this needs a complete list of the population. you write the names or
samples codes of each member and place them in a container, then randomly draw the
(2) Non Probability
Sampling: each desired number of samples. This is easy if the population is small.
member of the
population does not Systematic random sampling. This method is a sampling procedure with a random start.
have a known chance
of being included in
Samples are randomly chosen using the rules set by the researchers. This involves
𝑁
the sample. Hence, choosing the 𝑘 𝑡ℎ member of the population, with 𝑘 = , but there should be a random
personal judgment 𝑛
plays an important start.
role in the selection.
Example: Choose a sample of size 10 from N = 500.
1. Choose a random start, say 10.
500
2. Determine the 𝑘 𝑡ℎ period by 𝑘 = = 50, so every 50th member will be chosen starting from 10
10
So the respondents will be member number 10, 60, 110, 160, 210, 260, 310, 360, 410, 460.
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
3
Stratified random sampling. This is used when the population can be naturally
classified into groups or strata.
Example: A survey to find out families living in a certain municipality are in favor of
charter change will be conducted. To ensure that all income groups are represented,
respondents will be divided into high-income (Class A), middle (class B) and low-
income (class C) groups. Below is the distribution of income groups.
Strata Number of Families
Class A 1000
Class B 2 500
Class C 1 500
N 5 000
1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50%
response rate, n = 357
2. Use proportional allocation, how many from each group should be taken as sample?
Strata Number of Families Percent Number of
Samples (n)
Class A 1000 1000 (0.2)(357)
= 0.2 = 20%
5000 = 71.4 = 71
Class B 2 500 2500 (0.5)(357)
= 0.5 = 50%
5000 = 178.5 = 179
Class C 1 500 1500 (0.3)(357)
= 0.3 = 30%
5000 = 107.1 = 107
N 5 000 n = 357
So, 71 families should be taken as respondents from Class A, 179 from Class B and 107
from Class C, for a total of 357.
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected
should be organized systematically for easier and faster interpretation. They may be presented in
any of the following forms:
The textual form can be used if the data to be presented is few.
The tabular and graphical forms are used when more detailed information about the data is
to be presented.
A table is used when you want to present a data in a systematic and organized manner so that
reading and interpretation will be simpler and easier.
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
4
When a table is used, you must consider the following parts:
1. Table number Table 3
Distribution of students Hogwarts School According to Year Level
2. Table Title
3. Column header Year Level Number of Students
Freshman 350
4. Row classifier Sophomore 300
5. Body of the table Junior 250
Senior 200
6. Source note total 1 100
Source: Hogwarts Registrar
Example 1:
Table 1
Mahusay National High
School Enrolment, SY
2005-2006
Year Level Male Female
First 216 267
Second 197 216
Third 187 227
Fourth 176 215
Total 776 925
You will observe that the table above shows clearly the enrolment data in Mahusay National High
School for the school year 2005-2006.
Another type of tabular presentation is the frequency table also known as a frequency
distribution. It is an arrangement of the data that shows the frequency of occurrence of different
values of the variables.
A frequency table is constructed by listing the measurements from highest to lowest, then
making tally marks to record how often each number occurs. After tallying, count the marks and
record them in the proper column.
17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20
Solution: To prepare a frequency table for the given set of scores, the scores are listed from
highest to lowest, tally marks are made and counted. The counted tally marks will then
be recorded under the column frequency. Notice that every 5 th tally crosses the first
four tallies. This is done to make counting of marks easier especially if the number of
cases is rather big.
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
5
Score Tallies Frequency
20 //// 4
19 //// 4
18 //// 4
17 //// 4
16 //// 4
15 ////// 6
14 //// 4
13 //// 4
12 ///// 5
11 /// 3
10 /// 3
Total 45
Frequency Distribution Tables
If the number of measures in consideration is rather big, the presentation of data is further
simplified by grouping the measures into class intervals called a frequency distribution.
A frequency distribution is a distribution of the total number of measures or frequencies
over arbitrarily defined categories or classes. The number of measures falling under a class is
called class frequency.
Example 1.
The frequency distribution below shows the scores obtained by 300 students in an English test of
50 items.
Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300
In the example above, the symbol 45-49 and the other symbols which follow up to 10-14
are called class intervals. The end numbers are called class limits. For instance in the class
interval 45-49, 45 is called the lower limit while 49 is called the upper limit.
Each class interval has also a lower boundary and a higher boundary. For the class interval
45-49, the lower boundary is 44.5 while the higher boundary is 49.5. Hence, for the class interval
45-49, 44.5 – 49.5 are called the class boundaries.
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
6
The size of the class interval, also called class size is the difference between the upper
boundary and the lower boundary. Hence, the class size in the given example is 5
A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the
45−49
lower and upper class limit. For instance, the midpoint of the class interval 45-49 is 2 or 47.
Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given
data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖=
𝑛
Start the lowest class interval with the lowest value / score in the given data set. (lowest score
plus i). Continue until the highest value in the distribution is reached
THIS MODULE IS FOR THE EXCLUSIVE USE OF THE UNIVERSITY OF LA SALETTE, INC. ANY FORM OF REPRODUCTION, DISTRIBUTION,
UPLOADING, OR POSTING ONLINE IN ANY FORM OR BY ANY MEANS WITHOUT THE WRITTEN PERMISSION OF THE UNIVERSITY IS
STRICTLY PROHIBITED.
7