0% found this document useful (0 votes)
8 views31 pages

ITGY403 Lesson 2

Uploaded by

johnlouiiee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

ITGY403 Lesson 2

Uploaded by

johnlouiiee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

COURSE CODE: ITGY403

COURSE TITLE: DATA ANALYSIS


Instructor: IZANG, A.A (PhD, MSc, MNCS)
Lesson Two:
Foundation of Statistical Analysis, Statistical Test and Procedures
What is Statistical Analysis
• Statistical analysis means investigating trends, patterns, and
relationships using quantitative data. It is an important research tool
used by scientists, governments, businesses, and other organizations.

• Statistical analysis is the collection and interpretation of data in order to


uncover patterns and trends. It is a component of data analytics.
• Statistical analysis can be used in situations like:
• gathering research interpretations,
• statistical modeling or designing surveys and studies.

It can also be useful for business intelligence organizations that have to


work with large data volumes.
What is Statistical Analysis
• The goal of statistical analysis is to identify trends.

• A retail business, for example, might use statistical analysis to find


patterns in unstructured and semi-structured customer data that can be
used to create a more positive customer experience and increase sales.
Steps of Statistical Analysis
• Statistical analysis can be broken down into five discrete steps, as
follows:
• Describe the nature of the data to be analyzed.
• Explore the relation of the data to the underlying population.
• Create a model to summarize an understanding of how the data relates to the
underlying population.
• Prove (or disprove) the validity of the model.
• Employ predictive analytics to run scenarios that will help guide future actions.
Statistical Analysis Software
• Software for statistical analysis will typically allow users to do more
complex analyses by including additional tools for organization and
interpretation of data sets, as well as for the presentation of that data.
• IBM SPSS Statistics,
• RMP,
• E-views and
• Stata
• are some examples of statistical analysis software. For example, IBM
SPSS Statistics covers much of the analytical process. From data
preparation and data management to analysis and reporting.
What is Statistics?
Statistics as defined by Gupta (2008) is the science of collection,
organization, presentation, analyzing and interpretation of numerical data.
Two Aspects of Statistics
• Descriptive Statistics: This is concerned only with the collection,
organization, presentation and analysis of data. These statistical measures
include the measures of central tendency (location); measures of
dispersion (variability); skewness and kurtosis, etc.
• Inferential Statistics: This builds on the descriptive statistics by going a
step further to make interpretation. Its focus is guessing or inferring the
properties of a population from known properties of a sample of the
population.
• Hypotheses are developed and tested upon which inference(s) is/are drawn, and
generalization made. If you analyze data and make decisions or estimates based on
information obtained from the data, you are using inferential statistics
What are Variables?
According to James (2009) & Adebile (2010), variables are properties or characteristics of an
event, object or a person that can take on different values or amounts. Types of variables that are
of major concern in this course are:
• Independent Variables: These are variables that are treated or manipulated by the researcher,
and whose effect are measured and compared.
• This type of variables can be called cause, factor, regressor, explanatory, predictor and
input variables.
• For instance, Advertisement and Sales.

• Dependent Variables: These are variables that are observed as outcome of an experiment. They
measure the effects of the independent variables on the test unit. They change according to the
changes or manipulations made of the independent variables.
What are Variables?
• Extraneous variables: These are variables that might influence the independent
variables or the outcome measure, and whose effect may confound with the effect of the
independent variables. They are variables whose effects could not be controlled in an
experimental study, and which might have crept in to contaminate or influence the
expected outcome.

• Continuous Variables: These are variables that are capable of taking ordered set of
values within a certain range. Measured data can be whole numbers or fractions.
Between two values there are an infinite number of other values.

• Categorical Variables: These are variables for which the measured objects are
designed to a subclass or subset. The subclasses are distinct and non-overlapping. All
objects put into the same category are considered to have the same characteristic(s).

Note: The type of design used for the study is a pointer to the statistical techniques that
could be used. This also depends on the type of hypothesis and the type of data (in
terms of scale of measurement- nominal, ordinal, interval and ratio).
POPULATION, SAMPLE & HYPOTHESIS
• POPULATION - the totality of a group.(e.g. number of people: the total number of people
who inhabit an area, region, or country, or individuals of same species: all the plants or
animals of a particular species present in a place)

• SAMPLE - subset of population. Sample is used to generalize or infer.


• Generalize/infer: take part of a population and investigate.
Results from studying the sample could be concluded as affecting the whole
population.( Discuss a real life example of hypothesis testing e.g. A housewife tasting
to see if prepared soup is good to eat).

❖ Here the Population is the pot of soup;


❖ And we can say that the Sample is the tasted part of the soup. Why?

Why Do We Take Sample?


1. Cost
It is expensive to study every object within a population. Studying a sample requires
fewer resources.
2. Time
Studying a whole population will be time consuming, as compared to studying part of a
population.
POPULATION, SAMPLE & HYPOTHESIS
. Accuracy
Results obtained from a sample are more accurate than results obtained from
population. Why?
* Focus and thoroughness from studying a sample for accuracy is certain.
**Resources may not be enough to supervise studying a population.

4. Destructive Items
a. For example… manufacture a drug, and just start selling it without
testing.
What will happen? Mass burial. The drug will have to be tested before
selling to avoid destruction.
b. A bulb manufacturer wanting to study the life span of the bulb will have
to take a sample of the bulbs, dismantle (if necessary) and study it,
instead of dismantling all the bulbs at the end.
Hypothesis testing cannot be carried out without having a data.

METHODS OF DATA COLLECTION


Two Types of Data Collection : Primary Data & Secondary Data.
1. Primary
a. Data collected by the researcher himself.
b. Data collected for a specific purpose, and used for that purpose.
METHODS OF DATA COLLECTION
. When the data is used for another purpose, it becomes a secondary data.

2. Secondary.
a. Data obtained from somewhere; bulleting, subscription agencies,
journals.
b. Extracting information from other sources.

Collection of Primary Data: obtained by…


• Questionnaire Administration
• No research is better than its questionnaire.
• Faulty questionnaire means faulty research.
• Questionnaire is a form that contains a list of questions intended to obtain information.

Questionnaire is administered by…


Mail
personal interview
Telephone observation
Observation
Experiment
Focus/group discussion
SAMPLING TECHNIQUES
How do you select a sample?
❖Probability Sampling– every element in the population has equal
chance.

• Simple random sampling


+ It is the best.
+Every element in the population has equal chance of being selected.
This can be done through:
Lottery – all elements placed in a “basket”, giving all objects an
equal chance of being selected (for information).
Random table – elements are assigned numbers, placed in a
table, then selected at random.
SAMPLING TECHNIQUES
• Stratified sampling.
o Used where there is heterogeneous population.
o Separate groups (into strata), then use “simple random sampling” or “systematic
sampling” to select the sample.
• Cluster sampling.
o This is selection from randomly chosen groups of neighboring individuals.
o Cluster sampling is used when the population has not been listed.
o Choosing a group, you may have to interview everybody in the group; you can’t select part
of the group.
• Multi-stage sampling.
o Here, few areas are selected, which we believe are representative of the population as a
whole; We then take a random sampling within each of these areas where the population
is spread – particularly a geographical area. This works like “cascading”. For example,
pick states in Nigeria; then pick local governments from each state; you can then pick
other groups from each local government as sampling.
SAMPLING TECHNIQUES
❖ Non-probability Sampling – not all elements have an equal chance.

• Quota sampling.
o Very useful in market research.
o For example, looking around to interview only people who are wearing suits.
o Need to have a criterion before interviewing.
o Here, the enumerator is given instructions (for example, an instruction for an enumerator to
interview 30 people who are corporately dressed during a graduation ceremony).
o In this case, if one respondent refuses to respond, the enumerator can still look around for
another person to fill the instruction.
• Unlike probability sampling there is always room to fill blank spaces when a respondent
refuses to respond. For probability sampling, if a respondent refuses to respond, the
enumerator cannot select another person/object as replacement.
o This is done by specifying how many people or items within a certain group you want to be
sampled (set a quota) and then collect data from anyone or anything fitting into the required
category until the quota is filled.
• .Judgmental sampling (Purposive sampling)
o An expert uses personal judgment to select what a truly representative sample will be.
o It involves human judgment.
o There could be bias in this method.
• Convenience sampling
o Select a sample at your own convenience.

Is there any difference between purposive and convenience sampling….???


HYPOTHESIS TESTING
Null hypothesis is not tested because there is no difference.
One-tailed Test –
Here is where H1 is well specified.
Ho: Maclean = Close up
H1: Maclean > Close up
Or

a. Ho: Maclean = Close up


b. H1: Maclean < Close up
Alpha should not be too large. A higher level of significance affects the number of casualties.
IF CALCULATED IS LESS THAN TABULATED – IT FALLS IN THE ACCEPTANCE
REGION:ACCEPT.
IF CALCULATED IS GREATER THAN TABULATED – IT FALLS IN THE REJECTION
REGION: REJECT.

•Two-tailed Test –
1.It has two tails.
2.It is the one where the H1 (alternative hypothesis) is not well specified.
a. Ho: Mac = close up.
b. H1: Mac ≠ close up.

N.B Pls. note that In social sciences, the “alpha” is 5%, and 0.1% in other areas.
ACCEPTANCE AND CRITICAL REGION
CRITICAL REGION: This is the area under which Z can fall and be significant. The complement of the critical
region is the acceptance region. Complement in the sense that, if it doesn’t fall inside, it will fall outside.
Critical Region

Acceptance
Region

General Procedure for Testing Statistics Hypothesis:


1. Formulate Null Hypothesis and Alternative Hypothesis i.e. H0 and H1. After
collecting your samples, you have to set your H0 and H1.
2. Determine the appropriate test statistics and compute its value from sample data
(numerical calculation).
Note: If n (sample size) is large (n≥30), then use Z table (normal distribution).
HYPOTHESIS ERRORS
TYPE I & TYPE II ERRORS

DECISION Ho True Ho False

Reject Ho Type I error Correct decision

Accept Ho Correct decision Type II error


TYPE I ERROR: - Although it is unlikely that a test statistic would fall
in the critical region when Ho is true. In this case we reject Ho and make an
error in doing so. This is called TYPE I ERROR (i.e. Rejecting Ho when it is
true).
Example – rejecting the truth that “normally men commit more crime
than women”. The truth is that men commit more crime than women.

TYPE II ERROR: - This occurs when one fails to reject Ho when it is


false. A Type II error will occur if a test statistic does not fall in the critical
region, when Ho is in fact false.

Critical Region: - This is a subset of the sample space which leads to


the rejection of the null hypothesis under consideration.

Significance Level: - It is the probability of taking a wrong decision


(i.e. probability of making an error).
GENERAL PROCEDURE FOR TEST OF HYPOTHESIS
1. Formulate the null and alternative hypothesis.
a. This is the first thing to be done after the data has been collected. Ho
and H1 must be specified/set. H1 is picked from the objectives of the study.
H1 is the research hypothesis.
2. Determine the appropriate test statistic and compute its value.
a. Determine whether it is Z or t.
b. Z means it is normal.
c. When n is larger than 30, ie n ≥ 30 we use Z.
d. When n is small, that is, n < 30, we use t distribution.
3. Choose the level of significance, ie α.
a. Generally, 5% is used for management sciences.
4. Determine the critical region.
5. Make a statistical decision.
6. Interpret results.
a. Interpret statistical statements.
b. Explain in simple terms why Ho or H1 has to be accepted or rejected.
Where:

N.B When n is small, that is, n < 30, we use t distribution., which is given as:
…with (n – 1) degree of freedom.

Where:

***Determine the critical region…


ONE TAIL TEST

The table measures 0.5


Take α = 5%

1 – 0.05

This means that any value that is greater than or equal to (≥) 1.645
must be rejected (in other words, only ACCEPT values that are less
than 1.645).

1.645
When

Reject are ≥ 1.29values that


TWO TAILED TEST

-α/2 +α/2
- 1.96 +1.96

This means that any value that is ≤ –1.96 and ≥ 1.96 must be rejected (in other
words, only ACCEPT values that are between –1.96 and +1.96).
.

MEAN COMPUTATION
“ Mean” is termed as an average denoted by

Given X1, X2, X3,… Xn, the mean is

Where:
X1, X2, X3,… Xn could be taken as “ages” of students in a class.
MEAN:
AGES of Students:
50, 52, 42, 35, 65, 54, 32, 40, 42, 41, 46, 49, 47, 61, 45, 35, 18, 40, 48, 32, 38

Total of students’ ages

Total of students’ ages = 912


Number of students = 21
Sample Mean:
Ages of students:

50, 52, 42, 35, 65, 54, 32, 40, 42, 41, 46, 49, 47, 61, 45, 35, 18, 40, 48, 32, 38

Example 1:
A sport biologist claims that female distant runners turns to be taller on the
average than women in general who have an average height of 64”. To study
this claim, she obtained a random sample of 40 female distance runners, and
their heights were recorded; and their mean is given as 65.6”. The standard
deviation is 3.3”. Using this result, test the claim
at 5% level of significance.
Solve by following the procedure for hypothesis testing.
1. Formulate the null and alternative hypothesis.

Ho: µ = 64

H1: µ > 64
2. Determine the appropriate test statistic and compute its value.
a. Determine whether it is Z or t.
b. Z means it is normal.
c. When n is larger than 30, ie n ≥ 30 we use Z.
d. When n is small, that is, n < 30, we use t distribution.
N.B Here we use Z since the population is more than 30.

Where:

:
continue solving till final figure
3. Choose the level of significance, ie α.-
Generally, 5% is used for management sciences.
4. Determine the critical region.
a. it is a ONE TAIL TEST because the H1 is well specified.
b. On the TABLE, Zα at one tail is 1.645. To have this figure,
complete the above do the calculation .Use it to determine the
TABULATED figure (the figure should be 1.645).
Zα = 1.645.
5. Make a Statistical decision
a. Here you will compare CALCULATED with TABULATED, and decide
b whether you should accept or reject hypothesis.
6. Interpret results.
a. Interpret statistical statements.
b. Explain in simple terms why Ho or H1 has be accepted or rejected.

EXAMPLE 2: TWO TAIL TEST.


The mean lifetime of a sample of 100 fluorescent light bulbs produced by a company
is computed to be 1570 hours with standard deviation of 120 hours. If the “u”
(population mean) of all the bulbs produced by the company is 1600 hours, test
whether there is a similar difference (two tail test) at 5% level of significance.

You might also like