0% found this document useful (0 votes)
12 views

REVIEW OF BASIC STATISTICS

Uploaded by

Alvin Abinas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

REVIEW OF BASIC STATISTICS

Uploaded by

Alvin Abinas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

REVIEW OF BASIC STATISTICS

Prof. Roselle V. Collado


Review Modules:

 Module 1. Review of Basic Concepts


 Module 2. Numerical Descriptive Measures
 Module 3. Probability
 Module 4. Sampling Methods and Statistical
Inference
Module 1. Review of Basic Concepts

 Statistics
Singular sense: the science that deals with
collection, organization, presentation,
analysis and interpretation of data
Plural sense: a set of numerical information, a
processed data. Some examples are:
Population statistics, Statistics on Births,
Statistics on Enrollment.
Two phases of statistics:

 Descriptive statistics

 Inferential statistics

The two main areas of inferential statistics are:


1. Estimation
2. Hypothesis testing
Basic Concepts:

 Universe – “Who do you want to study?”


 Variable – “What do you want to know
about them?”
 Population
 Sample
 Distribution
Example:

A team of three geologists was investigating the


composition of river pebbles. They collected basalt
pebbles from a selected stream and took note of their
color (light, dark, very dark), number of holes, and their
diameters.
In the given problem identify the following:
1. universe of interest
2. variable(s) of interest
3. level(s) of measurement of the variable(s)
Example:

1. Universe – set of all pebbles in the selected


stream
2. Variables and their level of measurement:
color – qualitative
number of holes – quantitative
diameters – quantitative
Levels of Measurement

1. Nominal - values are simply labels or categories or


names without implied ordering in the labels
Examples: TIN, Gender, Civil Status

2. Ordinal - values are simply labels with an implied


ordering in the values; distance between two labels
is unknown
Example: sizes of shirts, job hierarchy (Manager,
Asst. Manager, Secretary)
Levels of Measurement

3. Interval- values can be ordered; distance between two


values is known; can add/subtract values but cannot
divide/multiply values; the zero point is arbitrary.
Example: Intelligence Quotient(IQ), Temperature(°F,
°C)

4. Ratio -has all the properties of the interval level; in


addition values can be divided or multiplied; the zero
point is fixed.
Example: age, area, mass length measurements
Example:

Refer to the variables in the previous example,


their levels of measurement are:
color – ordinal
number of holes – ratio
diameters – ratio
Methods of Data collection

1. Objective method – utilizes measuring


devices and human senses to gather
information
2. Subjective method – utilizes interviewing
techniques to gather information
3. Use of existing records – utilizes data
archives maintained by other institutions to
gather information
Types of Data:

1. Primary Data – first-hand information is


collected
2. Secondary Data – information is collected
from existing records
Methods of Data Presentation

1. Textual Method – utilizes the narrative form


of presenting salient information
2. Tabular Method – utilizes properly labeled
rows and columns to present more details.
Includes quantitative and qualitative FDTs.
3. Graphical Method – utilizes charts and
pictographs to present trends and
comparisons at a glance. Includes the
stemplot and the box-and-whiskers plot.
Module 2. Numerical Descriptive
Measures

 Measures of Location
 Measures of Dispersion
 Measure of Skewness
 Measure of Kurtosis
Measures of Location

1. Minimum
2. Maximum
3. Measures of Central Tendency
- Mean, Median, Mode
4. Quantiles
- Percentiles, Deciles, Quartiles
MCTs

 Arithmetic Mean

∑ Xi
µ= i
=1

N
 Median

X N +1 , if N is odd
 2

Md =  X N + X N
 2 +1


2
, if N is even
2
Mode
Mo is the most popular value in the data.

-May or may not exist for a given data set


-If it exists, it is not necessarily unique
Example:

The following summarizes the recorded


diameters of 30 grains of moulding sand.

Diameter 8 10 12 16 20 21
Frequency 5 8 8 4 3 2
MCTs:

Mean = 12.73

Median = 0.5(12+12)=12

Mode = 10 and 12 (bimodal)


Quantiles

 Percentiles (P1, P2, P3,…, P99)


 Deciles (D1, D2, D3, D4, D5, D6, D7, D8, D9)
 Quartiles (Q1, Q2, Q3)
Percentiles (Pj)

 j 
location of P j =  ×N
 100 

X  j  + X  j 
  100 N  N +1
 100 
 , if location of P j is a whole number
 2
Pj = 

data value in the next higher whole number position,
if location of P is not a whole number
 j
For Deciles and Quartiles

Convert Deciles and Quartiles to Percentile


equivalent and use the previous procedure to
find the quantiles sought.
Example:

From the previous data, compute for P42, D8


and Q2.

P42 = X13 = 10
D8 = P80 = (X24+X25)/2 = (16+16)/2 = 16
Q2 = P50 = (X15 + X16)/2 = (12 + 12)/2 = 12
Measures of Dispersion

1. Range ( R )

2. Variance (σ2)

3. Standard Deviation (σ)

4. Coefficient of Variation (CV)


Example:

Range = 21 – 8 = 13

Variance = 17.1289

Std. Dev’n. = 4.1387

CV = 32.5029
Measure of Skewness

3 ( µ − Md )
SK =
σ
Possible values of SK:
 SK = 0  symmetric distribution
 SK < 0  negatively skewed (skewed to the left)
 SK > 0  positively skewed (skewed to the right)
What is SYMMETRY?

A distribution is said to be symmetric about the


mean if the distribution to the left of mean is the
“mirror image” of the distribution to the right of the
mean.

µ = Md = Mo µ = Md = Mo
Likewise, a symmetric distribution has SK=0 since its
mean is equal to its median and its mode.
MEASURE OF SKEWNESS

SK > 0
positively skewed

Mo < Md < µ

SK < 0
negatively skewed

µ < Md < Mo
Measure of Kurtosis

∑ (X i − µ)
4

=K i=1
−3
Nσ 4

Possible values of K:
 K = 0  mesokurtic (bell-shaped) distribution
 K < 0  platykurtic (flat with thick tails) distribution
 K > 0  leptokurtic (peaked with thin tails)
distribution
MEASURE OF KURTOSIS

K=0
mesokurtic

K<0
K>0
platykurtic
leptokurtic
Example:

 SK = 0.5291

 K = 4.0174
Module 3. Probability

 Random Experiment
 Sample Space
 Event
 Probability
Example:

 Random Experiment: Selecting a lot to be


tested for durability from 10 lots produced in
a day.
 Sample Space­: S = {Lot 1, Lot 2, Lot 3, Lot
4, Lot 5, Lot 6, Lot 7, Lot 8, Lot 9, Lot 10}
 Event: E = event that a lot number greater
than 5 is tested
E = {Lot 6, Lot 7, Lot 8, Lot 9, Lot 10}
Approaches in Assigning Probability to
Events

 A Priori Approach
(Theoretical Approach)

 A Posteriori Approach
(Empirical Approach)

 Subjective Approach
Illustration: A Priori Approach

In the random experiment of tossing two coins.


Model: S = {HH, HT, TH, TT}
Assumption: the two coins are fair  equally-likely
sample space
 P({HH}) = P({HT}) = P({TH}) = P({TT}) = ¼
Thus, if E = the event of observing at least 1 head
E = {HH, HT, TH}
P(E) = P({HH, HT, TH}) = P({HH}) + P({HT}) + P({TH})
= ¼ + ¼+ ¼ = 3(¼) = ¾
Illustration: A Posteriori Approach

Suppose we repeatedly tossed two coins 100


times and in 78 times, we have observed at
least 1 head. The a posteriori probability that
at least 1 head shows up is 78/100 or 0.78.
Methods of Counting

Addition Principle
Suppose an operation can be done in n1
ways, a second operation in n2 ways and so
on to a kth operation that can be done in nk
ways. If these operations cannot be done at
the same time or simultaneously, then there
is a total of n1 + n2 + … + nk ways to
perform these tasks
Example:

A marble is drawn from a bag containing 2 yellow


and 3 green marbles. If the marble drawn is yellow,
a die is rolled. Otherwise, a coin is tossed. How
many possible outcomes are there?
Solution:
Operation 1: Marble drawn is yellow n1 = 6
Operation 2: Marble drawn is green n2 = 2
N(S) = 6 + 2 = 8
Fundamental Principle of Counting

Suppose an operation can be performed in


n1 ways, and for each of these, a second
operation can be performed in n2 ways, and
for each of these a third operation can be
performed in n3 ways, and so on until a kth
operation that can be performed in nk ways,
then the total number of ways these k
operations can be performed is given by
n1•n2•n3•…•nk.
Example:

A travel agency offers special vacation packages to any of the


following destinations: Cebu, Davao, Guimaras, Boracay, or
Palawan. The length of stay is either for 3 or 5 days, and the
type of accommodation can either be economy or business
class. In how many ways can one select a vacation package?
Solution:
Operation 1: Choose destination n1 = 5
Operation 2: Choose length of stay n2 = 2
Operation 3: Choose type of accommodation n3 = 2
N(S) = 5•2•2 = 20
Permutation

n!
n P r=
(n − r )!
n P n= n !
Examples:

a) In how many ways can eight students sit on a bench


which can accommodate only five people?
Solution: N(S) = 8P5 = 6720

b) The manager needs to assign eight of his


subordinates to do eight different jobs. Assuming
that only one shall be assigned to a job and any man
can do any job, in how many ways can the eight
subordinates be assigned to the jobs?
Solution: N(S) = 8P8 = 40320
Permutations in a Circle

The total number of permutations of n distinct


objects arranged in a circle is (n – 1)!.

Example: In how many ways can five


congressmen sit on a round table?
Solution: N(S) = (5-1)! = 4! = 24
Permutation of n Objects that are not
distinct

n! k

n1! n 2! n 3! ... n k !
where: n = ∑
i
ni
=1
= n1 + n 2 + ... + n k

Example: How many words can be formed by


permuting the letters of the word
“STATISTICS”?
10!
Solution: N(S) = = 50400
3! 3! 1! 2! 1!
Combination

n  n!
n C r= 
 r  = (n − r )! r !
 

Example: In how many ways can a person visit


14 churches among 23 churches this
summer?
Solution: N(S) = 23C14 = 817190
Number of Ways of Partitioning n
Objects

 n  n!
 =
 n1 n2 ... n r  n1! n 2! ... n r !

Example: In how many ways can seven


scientists be assigned to one triple and two
double hotel rooms?
Solution: N(S) = 7! = 2520
1! 2!
Solutions to sample problems:

1. There are four teams competing for two awards.


One is for the best design and the other for best
assembly. If any team can win at least one of
these, in how many ways can the awards be given?
Solution:
Operation 1: Awarding the best design n1=4
Operation 2: Awarding the best assembly n2=4

N(S) = 4 x 4 = 16
2. In how many ways can eight problems be
assigned to four groups if no two groups will
be assigned the same problem?

N(S) = 8P4 = 1680


3. A team is being created to work on an
experimental trial in outer space. The team
needs 2 engineers, 1 microbiologist, 1
agriculturist, 1 statistician and 1 astronaut.
If there are 7 engineers, 3 microbiologists, 2
agriculturists, 3 statisticians and 5
astronauts in the pool of choices, in how
many ways can the team be composed?
Operation 1: Choose 2 engineers n1=7C2
Operation 2: Choose microbiologist n2=3
Operation 3: Choose agriculturist n3=2
Operation 4: Choose statistician n4=3
Operation 5: Choose astronaut n5=5

N(S) = 21 x 3 x 2 x 3 x 5 = 1890
Event Relations and Probability

A. Complement of an Event (E C)
P(E C) = 1 – P(E).
B. Conditional Probability
P (E 1 ∩ E 2 )
P (E 1 | E 2 ) =
P (E 2 )

C. Independent Events
P(E1 | E2) = P(E1) or P(E2 | E1) = P(E2)
D. Mutually Exclusive Events
P(E1 ∩ E2) = 0.
Theorems on Probability

 Addition Theorem
P(E1 ∪ E2) = P(E1) + P(E2) – P(E1 ∩ E2)
If E1 and E2 are mutually exclusive then
P(E1 ∪ E2) = P(E1) + P(E2)

 Multiplication Theorem
P(E1 ∩ E2) = P(E1 | E2) * P(E2)
If E1 and E2 are independent
then P(E1 ∩ E2) = P(E1) * P(E2)
Examples:

1. Suppose 5 dead batteries are mixed up with 10


good ones.
i) What is the probability that if one battery is
selected, it is good?
ii) If eight batteries are drawn from the fifteen, what
is the probability that six are dead?
iii) If two batteries are drawn from the fifteen, what
is the probability that
a) both are good?
b) one is good and one is dead?
c) at least one is good?
Solutions:

i) P[1G] = 10/15 = 0.6667


ii) P[6D] = 0
iii)
a) P[2G] = 10C2 / 15C2 = 45/105 = 0.4286
b) P[1G and 1D] = 10(5)/ 15C2 = 0.4762
c) P[at least 1G] = 1 – P[0G] = 1- (5C2/15C2) =
1- 10/105 = 0.9048
2. A successful attack by an interceptor requires (a)
the reliable operation of a computing system, (b)
the transmission of correct directions, and (c) the
proper functioning of the striking mechanism. It is
known that these three occur independent of each
other. When the P[(a)] is 0.8 and (b) is assured,
the overall probability of success is 0.6. If the
computing system is improved to 90% reliability,
while P[(b)] is reduced to 0.8 and the P[(c)] remains
unchanged, what is the new overall probability of
success?
Solution:

Given: P[a] = 0.8 P[b] = 1.0 P[abc] = 0.6


thus P[c] = 0.75
If P[a] = 0.9 P[b] = 0.8 P[c] = 0.75
then P[abc] = 0.54
3. It is found that in manufacturing a certain article,
defects of one type occur with probability 0.1 and
another type with probability 0.05, also that the two
defects occur independently of one another.
Calculate the probability that
i) an article does not have both kinds of defect
ii) an article is defective
iii) a defective article has only one type of defect
Solutions:

Given: P[D1] = 0.1 P[D2] = 0.05


D1 and D2 are independent
i) P[D1’ and D2’] = (1-0.1) x (1-0.05)
= 0.9 x 0.95 = 0.855
ii) P[D1 or D2] = 0.1 + 0.05 – 0.1(0.05)
= 0.145
iii) P[(D1 and D2’) or (D1’ and D2)]
= 0.1(0.95) + 0.9(0.05)
= 0.14
Example:

4. Suppose that in a particular assembly line


of two sections, only 75% of all items
produced were found to be satisfactory. It
is claimed that the first section has a 95%
satisfactory rating in its production. What is
the probability that a satisfactory item from
section 1 will turn out to be a satisfactory
item at the end of the assembly line?
Solution:

Given: P[1st and 2nd] = 0.75


P[1st] = 0.95

P[2nd/1st] = 0.75/0.95 = 0.7895


Module 4. Sampling Methods and
Statistical Inference
Universe/Population Sam pling
(N units/observations) Sample
(n units/observations)

I nferences and
Generalizations
Two Broad Classification of
Sampling

1. Non-probability sampling
– No objective procedure in sample
selection.
– Probabilities of selection of units are not
known.
– Sampling errors cannot be computed.
– No inferences about the
universe/population can be made.
Some Non-probability Sampling
Methods

 Convenience sampling
 Accidental sampling
 Quota sampling
 Purposive sampling
2. Probability sampling
– Uses an objective procedure in sampling
selection (randomization procedure)
– Probabilities of selection of units are
known.
– Sampling errors can be computed.
– Valid inferences about the population can
be made.
Some Probability Sampling
Methods

 Simple random sampling (SRS)


 Stratified random sampling (StRS)
 Systematic random sampling (SyRS)
 Cluster sampling
 Simple two-stage sampling (Multi-stage)
Simple Random Sampling

If sampling is done from a population of size


N in such a way that every possible sample
of size n has the same chance of being
selected, the procedure is called simple
random sampling (SRS), and thus the
sample obtained is called a simple random
sample.
Simple Random Sampling

Two Types:
 SRS without replacement (SRSWOR)
– NCn number of possible samples
– Probability of selecting a sample: 1/NCn
 SRS with replacement (SRSWR)
– Nn number of possible samples
– Probability of selecting a sample: 1/Nn
Randomization Procedures

 Lottery method
 Use of the table of random numbers
 Random number generator (computer
programs that generate pseudo-random
numbers, e.g. calculator)
Stratified Sampling

Sampling procedure that subdivides


the universe into mutually exclusive
subgroups or strata and draws
independent samples from stratum
to stratum.
Stratified Sampling
Reasons for Stratification:
 To increase precision of estimates.
 To draw inferences for subclasses in the population.
Advantages of Stratification:
 Increases precision of estimates.
 Gives a better cross-section of the population.
 Simplifies the whole administration of the survey, especially if
the nature of the population itself dictates some inherent
stratification.
 Allows for flexibility in the choice of the sampling procedure
for each stratum.
Illustration: L = number of strata

L
N = ∑
i
=1
Ni = N 1 + N 2 + ... + N L
L
n = ∑
i
=1
ni = n 1 + n 2 + ... + n L

n1 n5
N1

N5 N4 n4
n2 N2

N3

n3
Systematic Sampling:

Adopts a skipping pattern in drawing a


sample from the population. Determines the
sampling interval as k = N/n. A random start
is taken and every kth unit thereafter is
selected.
Advantages vs. Disadvantages
 Very simple to implement.  It may be risky to use if there is
 Allows sample selection as the periodicity in the sampling
frame is being constructed. frame, yields to biased
 Probability structure is the estimates.
same as in SRS  The precision of the estimates
 Sample is more evenly spread cannot be assessed based on
across the universe. one systematic sample only.
(Advantageous when there is a
linear trend in the universe
since it gives a better cross-
section.)
 It is as precise as SRS if the
listing of the universe is random
Cluster Sampling

If sampling is done such that the population


is first grouped into non-overlapping
subgroups (clusters), then a simple random
sample of clusters is selected and all the
units in the chosen clusters are included in
the sample, the procedure is called cluster
sampling and the sample thus obtained is
called cluster sample.
Advantages:

 More economical than selecting individual units


directly.
 Simplifies the field operations considerably,
especially for large surveys.
 Results to more timely data.
 Frame requirement is very much simpler.
 More efficient than SRS if the clusters are
heterogeneous internally.
Disadvantages:

 Likely to be less efficient and precise than


SRS.
 Efficiency decreases as the cluster size
increases.
Multi-stage Sampling:

 An extension of Cluster sampling


(single/one-stage sampling)
 Characterized by sampling being done in
stages
 Usually applied in large-scale surveys.
Multi-Stage Sampling:
Terminology

 Primary sampling unit (psu) – largest sampling unit,


1st stage sampling unit
 Secondary sampling unit (ssu) – 2nd largest sampling
unit, 2nd stage sampling unit
 Tertiary sampling unit (tsu) – 3rd largest sampling
unit, 3rd stage sampling unit
 Ultimate sampling unit (usu) – unit whose
measurements will be made, smallest unit
Illustration: Simple Two-stage
Sampling

Assume each psu has the same number of ssu.


Let N =number of psu
M =number of ssu per psu
N×M =total number of ssu
Sampling procedure:
1. Draw a random sample of n psu from using
SRSWOR.
2. From each sample psu drawn, draw a random
sample of m from the M ssu using SRSWOR.
Examples:

1. In preparation for the La Niña phenomenon this month, the


Department of Public Works and Highways joined forces with
the Department of Health and the Department of Social
Welfare and Development to develop a strategic plan which will
provide basic services to Metro Manila residents. They
decided to get some feedback on these basic needs as
foreseen by some residents of Malabon, Metro Manila. They
went to the nearest barangay of Malabon and asked the first 30
residents that they encountered. The sampling procedure
done is____________.

2. (Refer to 1) Another option that they are considering is getting


one resident at random from each barangay of Malabon. This
will lead to having a _____________ sample.
Answers:

1. non-probability sampling

2. stratified
Estimation:

Parameters – descriptive measure of the


population

Statistics – descriptive measures of the sample

Note: We use statistics to estimate parameters


Types of Estimators:

1. Point Estimator – yields a single value for


an unknown parameter
2. Interval Estimator – yields a range of values
for an unknown parameter
SRS Estimation:

 Parameter: Population Mean µ


n

∑X i
Point Estimator: Sample Mean X= i =1

Interval Estimator:
 σ 
Known σ,  X  Z α 2 
 n
 s 
Unknown σ,  X  tα 2 ( n −1) 
 n
 Population Variance σ2
( )
n 2
∑ Xi − x
Point Estimator: s 2 = i =1
n −1
 Population Proportion P
a
Point Estimator: Sample Proportion p=
n

Interval Estimator: For sufficiently large n

 pq 
 p  zα 2 , q = 1 − p
 n 
STRS Point Estimation:

 Population Mean µ
L Ni
∑ wX
i =1
i
where
i wi =
N
 Population Variance σ2
L

∑ ii
w s
i =1
2

 Population Proportion P
L

∑w p
i =1
i i
Example:

Due to the burning of cotton plant wastes (hulls,


leaves, etc.), sulfate content in the air over a City is highest
during the month of November. If the data given below are
the mean values of sulfate content during the month of
November (analyses of air performed daily) over the past 8
years, what value for the sulfate content in Lubbock air can
the City Department of Health predict at a 95% confidence
level during next November? Compute an estimate of the
average sulfate content in the air this coming November.
Sulfate content, µg/m3 of air
10.83 8.90 14.71 12.35 11.8 9.68 9.33 10.9
Solution:

Sample mean = 11.0625

Sample std. dev’n = 1.8964

A 95% confidence interval for the mean is


[11.0625 ± t0.025(7) 1.8964/√8]
[11.0625 ± 2.365(0.6705)] or [9.4768, 12.6589]
Test of Hypothesis:

Statistical Hypothesis – statement about the


value of a population parameter or the
distribution of a data set
Null Hypothesis (Ho) – state of equality or
status quo; state of no difference and no
association
Alternative Hypothesis (Ha) – researcher’s
hypothesis; statement to be accepted in the
event that Ho is rejected
Test Statistic – transformation of the estimator
for the parameter to be tested
Critical region – the region within which the test
statistic favors the rejection of Ho at a given
level of significance
Level of Significance (α) – risk of rejecting Ho
when in fact it is true
Statistical Tests:

Z-test and T-test


 test procedures used when one wants to
compare the mean of a population to a
hypothesized value
 test procedure used to compare the means
of two populations (independent or related –
self-paired/matched pairs)
Binomial Test
 a statistical procedure used to test if a
hypothesized value for the population
proportion is acceptable or not.
Regression Analysis
 a statistical technique used for determining
the functional form of the relationship
between two or more variables.
 the ultimate objective is usually to predict
the value of the dependent variable given
the values of independent or concomitant
variables.
Correlation Analysis
 statistical technique used to determine the
strength or degree of linear relationship
between two variables.
Analysis of Variance
 statistical technique used to compare the
means of two or more populations based on
partitioning the total variance of the variable
of interest into several sources
Chi-square test of Goodness of fit
 statistical procedure used to test whether
the observed distribution is in agreement
with the expected or hypothesized
distribution
Chi-square test of Independence
 statistical procedure used to test whether
two variables( in at least the nominal scale)
are independent of each other.
Identify the most appropriate statistical
procedure to apply for each of the following
research objectives:

1. It is of interest to compare three feed rations


for chicken. Five chickens for each feed
ration was used in the study and the
dressed weight (in kg) after 30 days was
obtained.
2. Bennett and Franklin wanted to determine
the effect of annealing temperature on the
density of a borosilicate glass with high
silica content.
3. An electrical company claims that the lives of the
light bulbs it manufactures are normally distributed
with a mean of 5,000 hours and a standard
deviation of 200 hours. To test their claim, a
random sample of 100 bulbs produced by this was
tested and it was found that mean life is 4500
hours.
4. A consumer is interested to test the claim of a
manufacturer that at least 99% of the equipment
she supplies to a factory conform to specifications.
5. A filling machine for a production operation
must be adjusted if more than 8% of the
items being produced are underfilled. A
random sample of 80 items from the day’s
production contained 9 underfilled items.
Answers:

1. Analysis of Variance
2. Regression Analysis
3. T-test on one population mean
4. Binomial test
5. Binomial test
THANK YOU!

You might also like