0% found this document useful (0 votes)
6 views

UNIT 10

Unit 10 discusses statistical tests, defining them as tools for determining the significance of observations and explaining various types, including standard/z-scores and Student's t-tests. It covers the procedures for one-sample, paired, and unpaired t-tests, as well as ANOVA for comparing means across groups. Additionally, it introduces chi-square tests for analyzing categorical data and outlines the importance of selecting appropriate statistical tests based on study design and variable types.

Uploaded by

makhetidavid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

UNIT 10

Unit 10 discusses statistical tests, defining them as tools for determining the significance of observations and explaining various types, including standard/z-scores and Student's t-tests. It covers the procedures for one-sample, paired, and unpaired t-tests, as well as ANOVA for comparing means across groups. Additionally, it introduces chi-square tests for analyzing categorical data and outlines the importance of selecting appropriate statistical tests based on study design and variable types.

Uploaded by

makhetidavid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 10: STATISTICAL TESTS

Unit objectives
1. Define the statistical test
2. Explain types of statistical test
10.1 Definition
A test used to determine the statistical significance of an observation.
A statistical test provides a mechanism for making quantitative decisions about a
process or processes
10.2 Types of statistical test
Standard/z-score and the Student’s t-test
Standard/z-score
In statistics, a standard score indicates how many standard deviations an observation or
datum is above or below the mean. It is a dimensionless quantity derived by subtracting
the population mean from an individual raw score and then dividing the difference by the
population standard deviation or by subtracting the population parameter mean from the
sample statistics and dividing by the standard error. This conversion process is called
standardizing or normalizing.
The standard deviation is the unit of measurement of the z-score. It allows comparison of
observations from different normal distributions, which is done frequently in research.
Standard scores are also called z-values, z-scores, normal scores, and standardized
variables. The use of "Z" is because the normal distribution is also known as
distribution". They are most frequently used to compare a sample to a standard normal
deviate (standard normal distribution, with μ = 0 and σ = 1).
Formula
The standard score is

x is a raw score to be standardized;


μ is the mean of the population;
σ is the standard deviation of the population.
Or
z= xbar-µ
s/√n
Where:
x bar is the sample meanµ is the population mean
s is the sample standard deviation
n is the sample size
The quantity z represents the distance between the raw score and the population mean
in units of the standard deviation. z is negative when the raw score is below the mean,
positive when above.
A key point is that calculating z requires the population mean and the population
standard deviation, not the sample mean or sample deviation. It requires knowing the
population parameters, not the sample statistic drawn from the population of interest.
But knowing the true standard deviation of a population is often unrealistic except in
cases such as standardized testing, where the entire population is measured. In cases
where it is impossible to measure every member of a population, the standard deviation
may be estimated using a random sample. For example, a population of people who
smoke cigarettes is not fully measured.
Student's t-test
The t-statistic was introduced in 1908 by William Gosset, a chemist working for the
Guinness brewery in Dublin, Ireland ("Student").
It’s used when sample size is less than 30. When the sample size is large, it is assumed
that it will produce a normal distribution hence z-distribution is used.
W. Gosset using the name student showed that the mean of a sample from a normal
distribution with unknown variance has a distribution that is similar to but not quite the
same as a normal distribution. He called it the t-distribution (‘student’t-distribution). This
has been used in hypothesis testing for small size sample.
The t-distribution has one parameter, a quantity called the df. The concept is one of the
most elusive statistical ideas. Df is calculated as the sample size minus the number of
estimated parameters. A single sample of n observations will have n-1 df.
NB: The application of the t-test distribution has also been lately embraced in studies
with sample size more than 30.
T-test in a single group
1. One sample t test
In a study where a single group of individuals randomly drawn from a population
assumed to be normally distributed, we can perform a one sample t-test to determine
this. We need to calculate the sample statistic and then use it to estimate population
parameter. For example, we may have a sample of patients with a specific medical
condition e.g. you have been monitoring triglyceride levels in blood of healthy individuals
and know that they have a geometric mean of 1.74 mmol/litre. You wish to know
whether the average level in the population from which the patients come is the same as
this value. The test that is used to determine this is one sample t-test.
This is done by testing a hypothesis.
To test the hypothesis, you begin by:
1. Stating the hypotheses i.e. the null and alternative hypothesis
 H0; the mean in the population,µ, is equal to sample statistic xbar
 H1; the mean in the population,µ, is not equal to sample statistic xbar
2. Collect relevant data from a sample of individuals
3. Calculate the value of the test statistic specific to the null hypothesis
You can use the z-score or the t-distribution

4. Compare the value of the test statistic to values from the known probability distribution

in this case the t-distribution or the z-score distribution.

5. Interpret the p-value and results and decide to reject or fail to reject H0.

In testing the null hypothesis that the population means is equal to a specified value μ0,

one uses the statistic

Where s is the sample standard deviation of the sample and n is the sample size. The
degrees of freedom used in this test is n − 1.
2. Paired sample t test

This is usually used to compare means in two paired group. For example:
The variable mat be measured on each individual in two circumstances e.g before
intervention (baseline information) and after the intervention or a patient may have two
data sets e.g. during treatment and while taking a placebo
Means in the two different data sets can be compared. This is done by following the
following procedure:
1. Define the null and alternative hypothesis
 H0; the mean in the sample statistic 1 is equal to sample statistic 2
 H1; the mean in the sample statistic 1 is not equal to sample statistic 2
2. Collect relevant data from a sample of individuals
3. Calculate the value of the test statistic specific to the null hypothesis
4. Compare the value of the test statistic to values from the known probability distribution
in this case the t-distribution or the z-score distribution.
5. Interpret the p-value and results and decide to reject or fail to reject H0.
Note:
The non-parametric test for this is the wilcoxon signed ranks test. This done based on
the median values and not the mean.
3. Unpaired sample t-test
In the population, the variable being measured is normally distributed in each group but
the variance of the two groups are the same.
We consider the difference in the means of the two groups. Under the null hypothesis
that the population means in the two groups are the same, this difference will be equal
to zero. Therefore we use a test statistics based on the difference in the two sample
means and on the value of the difference in the 2 groups means under the null
hypothesis. The test statistic often referred to as t, follows the t-distribution
Note: The non-parametric test for this is the wilcoxon sum ranks test. This done based
on the median values and not the mean.
ANOVA (analysis of variance)
– We may have samples from a number of independent groups. We can determine a single
numerical variable in each of the groups e.g. the average platelets count in women of
different ethnic groups.
– Although we could perform tests to compare the averages in each pair of groups, it is
better to have a statistical tool that can compare the groups all at once. This is done by
ANOVA
– The one way analysis of variance separates the total variability in the data into that
which can be attributed to the differences between the individuals from the different
groups (the between group variation) and to the random variation between the
individuals within each group (within-group variation).
– Under the null hypothesis that the group means are the same, the between group
variance will be similar to the within group variance.
Key points to note in ANOVA
– ANOVA is used to see an association between a continuous outcome variable (such as
mean HAZ score) and a categorical determining variable (such as iodized salt
consumption).
– The ANOVA is a Statistics option under the Means function that allows for testing the
difference between the mean outcome scores for the two or more categories of the
determining variable.
– According to SPSS 8.0, Analysis of Variance Analysis, or ANOVA, is a method of testing
the null hypothesis that several group means are equal in the population, by comparing
the sample variance estimated from the group means to that estimated within the
groups.
– One -way ANOVA- According to SPSS 8.0, The One-Way ANOVA procedure produces a
one-way analysis of variance for a quantitative dependent variable by a single factor
(independent) variable. Analysis of variance is used to test the hypothesis that several
means are equal. This technique is an extension of the two-sample t test.

Note:
The non-parametric test for this is the Kruskal-walis test. This done based on the median
values and not the mean.
Statistical Tests used in Data analysis
Deciding which statistical test to use depends on:
– The design of the study
– The type of the variable
– The distribution of the data (normal, binomial or not normal)
When performing analysis to answer a research question, it is important to first identify
the types of variables that will be used and choosing an outcome variable and one or
more potential "independent" or determining variables. Once this is done, you must
decide how you would like to use these in a statistical test to see if a relationship exists.
The table below gives an idea of how to choose the appropriate test to use for statistical
analysis depending on the variables you have chosen:
PREDICTOR VARIABLE
OUTCOME VARIABLE
(S)
Categorical Continuous
t-test, ANOVA (Analysis of
Chi Square, Log
Categorical Variance), Linear
linear, Logistic
regression
Logistic Linear regression, Pearson
Continuous
regression correlation
Mixture of Logistic Linear regression, Analysis
Categorical and
regression of Covariance
Continuous

Chi-Square

THE 2X2 TABLE AND CHI SQUARE TEST OF ASSOCIATION

Many epidemiological investigations are based on data of a qualitative nature. If a

patient is treated with a particular drug it may be relatively easy to assess whether there

has been response to treatment and an improvement or deterioration in the condition,

but it is usually more difficult to measure such changes on a continuous scale.

In epidemiological investigation, individuals must often be placed exclusively into one of

a limited no. of categories such as dead/alive, drinks alcohol/ does not drink alcohol,

smokes/ does not smoke etc. It’s therefore important that we have means of analyzing

categorical /qualitative data of this kind.

The 2x2 table


These are tables that can arise from categorical data that only has 2 possibilities. It’s a

table in which we have divided individuals into 2 classes with respect to 2 different

attributes and we wish to test for an association between the 2 attributes.

For example if we wish to test the effectiveness of a new vaccine against cholera in an

area where the disease is common, we might take a group of 450 people and inoculate

200 of them chosen randomly with a new vaccine. The whole group of 450 is then

followed up over a sufficient period of time to determine how many people are attacked.

Suppose 15 people among the inoculated and 100 among the non-inoculated are

attacked, the results can be presented in the form of a 2x2 table showing the association

between inoculation and infection.

2x2 table showing results

Inoculation with new vaccine results of

exposure

Attacked Not attacked

Total
Yes 15 185

200

No 100 150

250

Total 115 335

450

In the figure, the last row and column are called marginal totals and the total 450, the

grand total.

Chi-Square tests
The results in the above table look as though the new vaccine offers protection to a
considerable extent. This kind of data may be used to make a decision on whether or not
to vaccinate a population of a town in order to restrict the seriousness of an outbreak of
disease. Initiating such an intervention may require lots of money. It’s therefore
important to make a right decision. This can be done through hypothesis testing.
We need to start by stating the hypothesis that the vaccine is useless (H0). We then seek
a summary measure which will give some indication of how far the observed data depart
from the H0.
Overall, 115 people were attacked out of 450 under study (25.6%). If the null were true,
we would expect 25.6% of individuals to be attacked in both the inoculated and
uninoculated. That is we would expect the following people to be attached among the
inoculated group of 225.
115/450 x 200 =51.1
With fixed margin totals the other expected result are obtained by subtraction as shown
below

Expected results of the data in table above

Inoculation attacked not attacked


Total

Yes 51.1 148.9


200
No 63.9 186.1
250

Total 115 335


450

We should then construct a summary statistic which measures the departure of the
observed from the expected results. In this case, we use the χ2
χ2 = ∑(O-E)2/E
Where O is the observed value in every cell
E is the expected value in every cell
The value is then compared with the value on a chi-square table. A 2x2 table of chi-
square has 1 degree of freedom hence with a set p value at 0.05; the chi-square value is
usually 3.84. So if your chi-square value is bigger than this, then it means that you shall
have a less probability value than 0.05. This with therefore be significantly different
hence reject your H0.
Alternative formula for χ2
If we were to denote the entries in the 2x2 table as follows

Column
1 2
Total

1 a b
r1
Row
2 c d
r2

Total s1 s2
N

Then
χ2 = (ad-bc) 2 N/ s1s2 r1 r1
Example
Suppose 53 children are selected for a study of the effectiveness of an anti-malarial
drug. 22 of them, selected at random are given the drug and the remaining 31 a dummy
tablet. At the end of two weeks the following results were recorded.

Drugs attacked not attacked


Total

Anti-malaria drug 2 20
22
Placebo 11 20
31

Total 13 40
53

Interpretation of Chi square results:

1. We can use the chi-square normal distribution by reading the P-value against
the chi-square value or
2. The critical value of 3.84 and if the chi-square value calculated does not
exceed this critical value, then we say that there is no association between the
variables in question (fail to reject the H0) and if it is greater than 3.84, then
there is an association hence reject the H0.

UNIT 11: CORRELATION AND REGRESSION ANALYSIS


Unit objectives

1. Explain the forms of correlation


2. Describe computation and interpretation of line of best fit
11.1 Correlation

This is used to study the possible linear (straight line) association between 2 quantitative
variables. This tells us how much the 2 variables are associated.

A simple and effective way of examining the relationship between quantitative variables
is to use a scatter plot. This is so called because a scatter of points one for each
individual. For example, in a study of hypertension in 37 women; clinicians were
interested in the relationship between age and systolic blood pressure (SBP).
We can demonstrate how each point is plotted to find out the values for x and y for the
arrowed point, we draw a horizontal and vertical line to the x and y axis from the point.
In practice, we know the values for x and y axis and then use them to plot.

Example 2

When we specifically want to examine whether hemoglobin level changes with age;

– Age –will be the explanatory variable for hemoglobin levels


– Hemoglobin level – will be the response variable

It is standard to plot:

– The explanatory variable usually the exposure on the x-axis (horizontal axis)
– The response variable usually the outcome variable (dependent) on the y-axis (vertical
axis)

We can then look at the scatter plot and find out whether there is an association
between two quantitative variables. To measure the degree of association, we calculate
the correlation co-efficient (r).

The standard method is to calculate the Pearson’s Correlation Co-efficient denoted as r.

Pearson’s Correlation Co-efficient


This is a correlation co-efficient that measures the scatter of the points around an
underlying linear trend (straight line). This co-efficient can take any value from -1 to +1.
Pearson’s correlation co-efficient uses the difference of all the points from the overall
mean
If the correlation co-efficient is -1 or +1, then the points in a scatter plot will lie exactly
on the straight line.
Note:
– The correlation is positive if high values of one variable are associated with higher values
of the other variable, but the points do not have to lie exactly on the straight line.
– The correlation is negative if values of one variable decreases as the values of the other
variable increases but the points do not have to lie exactly on a straight line.
– If there is no linear relationship, then the correlation is zero. But one should be careful
when r=0, there could be a strong non-linear relationship between the two variables
Assumptions for use of correlation
– A Pearson’s correlation co-efficient may be calculated for any data set. However, it is
more meaningful when the 2 variables have an approximately normal distribution
– Another assumption is that all observations should be independent i.e. that only one
observation for each variable should come from each individual in the study
Interpretation of correlation
 The correlation co-efficient r can lie between -1 to +1.
 A value of +1, indicates a perfect correlation
 A value of -1 indicates a perfect negative correlation
 A value of zero, indicates no linear association between the two variables
 A zero correlation does not always indicate no relation since the relationship may be
non-linear e.g. curvilinear relationship hence other methods may be used to describe
such relationships.
 One way of looking at correlation co-efficient is to calculate 100r2. This gives the % of
variability of the data that is explained by a linear association between the 2 variables.
By calculating this, you can be able to get the % of variability and be able to know
whether the variability is high or not.
Key Points
 In general, correlation is useful in generating hypothesis rather than testing a hypothesis
and that’s why there are no hypothesis testing and P-values in this session.
There are points to remember on correlation
1. The data should be shown on a scatter plot
2. The correlation co-efficient r, should be given to 2 decimal places
3. The number of observations should always be stated
4. Correlation tells us about the strength of association i.e. the degree of association
11. 2 Regression
We may also want to know what the other value of one of the variables likely to be when
we know the value of the other. The method used in this is linear regression. To describe
the relationship between these two quantitative variables and to predict the value of one
given the other variable we shall use linear regression
Linear Regression
Linear regression is used quite often to test the relationship of this outcome with a
combination of continuous and categorical determining variables (such as illness,
feeding practices including breastfeeding, environmental influences, and care practices
among others). According to SPSS 8.0, Linear Regression estimates the coefficients of
the linear equation, involving one or more independent variables, that best predict the
value of the dependent variable. For example, you can try to predict a salesperson's
total yearly sales (the dependent variable) from independent variables such as age,
education, and years of experience.
Regression Coefficients.
Estimates displays Regression coefficient B, standard error of B, standardized coefficient
beta, t value for B, and two-tailed significance level of t. Confidence intervals displays
95% confidence intervals for each regression coefficient, or a covariance matrix. Model
fit. The variables entered and removed from the model are listed, and the following
goodness-of-fit statistics are displayed: multiple R, R2 and adjusted R2, standard error of
the estimate, and an analysis-of-variance table.
R squared change. Displays changes in R**2 change, F change, and the significance of F
change.
When conducting regression test, one aims at describing the linear relationship between
two quantitative variables and explains the value of one variable given the value of
another.
When one of the variables is thought to depend on the other, it is more appropriate to
quantify the relationship between them. If we can do this, we can also be able to
estimate the value of the other.
The method we use to do this is called regression. In this session we shall focus on linear
regression.
Regression: Studies the relationship between 2 variables when one of them depends on
the other. This also allows one variable to be estimated given the values of the other.
Description of the relationship
The line through the data suggest that with an increase in age, SBP increases too
Remember, we referred to the variable in the horizontal axis as x and the variable on the
vertical axis as y. Unlike correlation, it is important to know which variables go to which
axis for regression.
When we are interested in explaining one variable, the variable is the explanatory
variable and the y variable is the response or outcome variable.
In the above example, researchers were interested in how SBP varied with age. Thus,
SBP will be explained by age i.e. for a given age with a regression line, we could explain
the relationship between age and SBP and say what average SBP would be for that age.
So SBP is the variable that will be explained by age.
Fitting a regression line
The aim is to fit a straight line to the data that best describes the relationship between x
and y. This is called the line of best fit
Y= α+βX
Further Explanation and Application of Linear Regression
Linear Regression modeling is used for a quantitative response outcome. For example,
data (n = 55) on the age and the systolic BP were collected and we want to set-up a
Linear Regression Model to predict BP with age. Here we could, after checking the
normality assumptions for both variables, do a bivariate correlation (Pearson’s
correlation = 0.696, p<0.001) and a graphical scatter plot would be helpful. See fig.
below

Figure: Scatter Plot


There’s a moderately strong correlation between age and systolic BP but how could we

‘quantify’ this strength

UNIT 12: INTRODUCTION TO SAMPLING METHODS

Unit objectives
1. Define terms used in sampling methods
2. Explain types of sampling and their limitation
12.1 Definition of terms
 Sampling is the process of selecting a representative group from the population under
study.
 The target population is the total group of individuals from which the sample might be
drawn.
 A sample is the group of people who take part in the investigation. The people who take
part are referred to as “participants”.
 Generalizability refers to the extent to which we can apply the findings of our research
to the target population we are interested in.
12.2 Types of Sampling
Two general approaches of sampling are used in social science research. With probability
sampling, all elements (e.g., persons, households) in the population have some
opportunity of being included in the sample, and the mathematical probability that any
one of them will be selected can be calculated. With non-probability sampling, in
contrast, population elements are selected on the basis of their availability (e.g.,
because they volunteered) or because of the researcher's personal judgment that they
are representative.
Probability Sampling includes: Simple Random sampling, Systematic sampling,
Stratified random sampling, Multistage sampling, Multiphase sampling and Cluster
sampling
Non-Probability Sampling includes: Convenience sampling, Purposive sampling and
Quota sampling.
Simple Random Sampling: It‘s applicable when population is small, homogeneous &
readily available. All subsets of the frame are given an equal probability and each
element of the frame thus has an equal probability of selection. It provides for greatest
number of possible samples where it‘s done by assigning a number to each unit in the
sampling frame. A table of random number or lottery system is used to determine which
units are to be selected.
Advantages
– Estimates are easy to calculate.
– Simple random sampling is always an EPS design, but not all EPS designs are simple
random sampling.
Disadvantages
– If sampling frame is large, this method impracticable.
– Minority subgroups of interest in population may not be present in sample in sufficient
numbers for study.

Systematic Sampling
It relies on arranging the target population according to some ordering scheme and then
selecting elements at regular intervals through that ordered list. The sampling technique
involves a random start and then proceeds with the selection of every kth element from
then onwards. In this case, k = (population size/sample size). It is important that the
starting point is not automatically the first in the list, but is instead randomly chosen
from within the first to the kth element in the list and a simple example would be to
select every 10th name from the telephone directory (an 'every
10th' sample, also referred to as 'sampling with a skip of 10').
Advantages
– Sample easy to select
– Suitable sampling frame can be identified easily
– Sample evenly spread over entire reference population
Disadvantages
– Sample may be biased if hidden periodicity in population coincides with that of selection.
– Difficult to assess precision of estimate from one survey.

Stratified Sampling
It‘s a technique where population embraces a number of distinct categories; the frame
can be organized into separate "strata." Each stratum is then sampled as an
independent sub-population, out of which individual elements can be randomly selected.
Advantages
– Every unit in a stratum has same chance of being selected.
– Using same sampling fraction for all strata ensures proportionate representation in the
sample.
– Adequate representation of minority subgroups of interest can be ensured by
stratification and varying sampling fraction between strata as required.
– Each stratum is treated as an independent population hence different sampling
approaches can be applied to different strata.
Disadvantages
– Sampling frame of entire population has to be prepared separately for each stratum
– When examining multiple criteria, stratifying variables may be related to some, but not
to others, further complicating the design, and potentially reducing the utility of the
strata.
– In some cases (such as designs with a large number of strata, or those with a specified
minimum sample size per group), stratified sampling can potentially require a larger
sample than would other method
Stratification is sometimes introduced after the sampling phase in a process called post
stratification‘. This approach is typically implemented due to a lack of prior knowledge of
an appropriate stratifying variable or when the researcher lacks the necessary
information to create a stratifying variable during the sampling phase. Although the
method is susceptible to the pitfalls of post hoc approaches, it can provide several
benefits in the right situation. Implementation usually follows a simple random sample.
In addition to allowing for stratification on an ancillary variable, post stratification can be
used to implement weighting, which can improve the precision of a sample's estimates.
Choice-based sampling is one of the stratified sampling strategies. In this, data are
stratified on the target and a sample is taken from each strata so that the rare target
class will be more represented in the sample. The model is then built on this biased
sample. The effects of the input variables on the target are often estimated with more
precision with the choice-based sample even when a smaller overall sample size is taken
compared to a random sample. The results usually must be adjusted to correct for the
oversampling
Cluster Sampling
Cluster sampling is an example of 'two-stage sampling' where in the first stage a sample
of areas are chosen and in the second stage a sample of respondents within those areas
is selected.
Population divided into clusters of homogeneous units, usually based on
geographical contiguity. Sampling units are groups rather than individuals and a sample
of such clusters is then selected where all units from the selected clusters are studied
Advantages
– It cuts down on the cost of preparing a sampling frame.
– This can reduce travel and other administrative costs.
Disadvantages
– Sampling error is higher for a simple random sample of same size.
– Often used to evaluate vaccination coverage in EPI Difference between Strata and
Clusters
– Although strata and clusters are both non-overlapping subsets of the population, they
differ in several ways.
– All strata are represented in the sample; but only subsets of clusters are in the sample.
– With stratified sampling, the best survey results occur when elements within strata are
internally homogeneous. However, with cluster sampling, the best results occur when
elements within clusters are internally heterogeneous
Multistage Sampling
It involves a complex form of cluster sampling in which two or more levels of units
are embedded one in the other. This technique is essentially the process of taking
random samples of preceding random samples though not as effective as true random
sampling, but probably solves more of the problems inherent to random sampling.
It‘s an effective strategy because it banks on multiple randomizations and its used
frequently
when a complete list of all members of the population does not exists or it‘s
inappropriate.
Advantages
Survey by such procedure is less costly, less laborious & more purposeful

Quota Sampling
In this technique of sampling, the population is first segmented into mutually exclusive
sub- groups, just as in stratified sampling and then a judgment used to select subjects or
units from each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between
the age of 45 and 60. It is this second step which makes the technique one of non-
probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers
might be tempted to interview those who look most helpful. The problem is that these
samples may be biased because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has been a matter of
controversy for many years.
Convenience Sampling
Also called grab or opportunity sampling or accidental or haphazard sampling.
It‘s a type of non-probability sampling which involves the sample being drawn from that
part of the population which is close to hand i.e. readily available and convenient.
The researcher using such a sample cannot scientifically make generalizations about the
total population from this sample because it would not be representative enough and
this type of sampling is most useful for pilot testing

You might also like