Module 1-Introductory Concepts_FoEd 203
Module 1-Introductory Concepts_FoEd 203
Most introductory statistical texts begin with an obligatory opening paragraph or possibly a separate
box on “What is Statistics?” Here are a few examples: (Retrieved from
https://www.annualreviews.org/content/journals/10.1146/annurev-statistics-022513-115703)
a. “Statistics is a way of reasoning, along with a collection of tools and methods, designed to help
us understand the world” (De Veaux et al. 2006, p. 2).
b. “Statistics is the art of making numerical conjectures about puzzling questions” (1999)
c. “Statistics is a set of concepts, rules, and methods for (1) collecting data, (2) analyzing data, and
(3) drawing conclusions from data” (Iversen & Gergen 1997, p. 4).
d. “Statistics helps provide a systematic approach for obtaining reasoned answers together with
some assessment of their reliability in situations where complete information is unobtainable or not
available in a timely manner” (Johnson & Tsui 1998, p. 2).
e. “Statistics is the art and science of gathering, analyzing, and making inferences from data”
(Mosteller et al. 1961, p. 2).
f. “Statistics is the art of learning from data. It is concerned with the collection of data, its
subsequent description, and its analysis, which often leads to drawing conclusions” (Ross 1996, p. 5).
g. “Statistics is a collection of procedures and principles for gaining and processing information in
order to make decisions when faced with uncertainty” (Utts 1996, p. 5).
h. “Statistics is a body of methods for making wise decisions in the face of uncertainty” (Wallis &
Roberts 1962, p. 11).
Definition. Measurement is the process of determining the value or label of a particular variable
for a particular experimental unit.
Classification of Variables
a. Discrete vs Continuous
Discrete variable - a variable which can assume finite, or, at most, countably infinite
number of values; usually measured by counting or enumeration
(e.g. number of crew in a ship, number of car collections)
Continuous variable - a variable which can assume infinitely many values
corresponding to a line interval (e.g. age, weight, height)
b. Qualitative vs Quantitative
Qualitative variable - a variable that yields categorical responses (e.g., political affiliation,
occupation, marital status)
Quantitative variable- a variable that takes on numerical values representing an amount or
quantity (e.g., weight, height, no. of cars)
c. Interval Level
The interval level is that which has the properties of the nominal and ordinal levels, and
in addition, the distances between any two numbers on the scale are of known sizes. An
interval scale must have a common and constant unit of measurement. Furthermore, the
d. Ratio Level
The ratio level of measurement contains all the properties of the interval level, and in
addition, it has a “true zero” point.
Examples: Age (in years), Number of correct answers in an exam
Definition. The target population is the population from which information is desired.
Definition. The sampled population is the collection of elements from which the sample is actually
taken.
Definition. The population frame is a listing of all the individual units in the population.
Advantages
- The theory involved is much easier to understand than the theory behind other sampling
designs.
- Inferential methods are simple and easy.
Disadvantages
Advantages
- Stratification may produce a gain in precision in the estimates of characteristics of the
population
- It allows for more comprehensive data analysis since information is provided for each
stratum.
- It is administratively convenient.
Disadvantages
- A listing of the population for each stratum is needed.
- The stratification of the population may require additional
prior information about the population and its strata.
Method B
Step 1: Number the units of the population consecutively from 1 to N.
Step 2: Let k be the nearest integer less than N/n.
Step 3: Select the random start r, where 1≤ r ≤N. The unit corresponding to r is the first unit
of the sample.
Step 4: Consider the list of units of the population as a circular list, i.e., the last unit in the list
is followed by the first. The other units in the sample are the units corresponding to r + k,
r + 2k, r + 3k,...,r+ (n-1)k.
Advantages
- It is easier draw the sample and often easier to execute without mistakes than simple
random sampling.
- It is possible to select a sample in the field without a sampling frame.
- The systematic sample is spread evenly over the population.
Disadvantages
Ø Cluster Sampling
Description of the Design
Cluster sampling is a method of sampling where a sample of distinct groups, or clusters, of
elements is selected and then a census of every element in the selected clusters is taken. Similar to
strata in stratified sampling, clusters are non-overlapping sub-populations which together comprise
the entire population. For example, a household is a cluster of individuals living together or a city
block might also be considered as a cluster. Unlike strata, however, clusters are preferably formed
with heterogeneous, rather than homogeneous elements so that each cluster will be typical of the
population.
Clusters may be of equal or unequal size. When all of the clusters are of the same size,
the number of elements in a cluster will be denoted by M while the number of clusters in the
population will be denoted by N.
Sample-Selection Procedure
Step 1: Number the clusters from 1 to N.
Step 2: Select n numbers from 1 to N at random. The clusters corresponding to the selected
numbers form the sample of clusters.
Step 3: Observe all the elements in the sample of clusters.
Advantages
- A population list of elements is not needed; only a population list of clusters is required.
Thus, listing cost is reduced.
- Transportation cost is also reduced.
Disadvantages
- The costs and problems of statistical analysis are greater.
- Estimation procedures are more difficult.
Ø Multistage Sampling
Description of the Design
In multistage sampling, the population is divided into a hierarchy of sampling units
corresponding to the different sampling stages. In the first stage of sampling, the population is
divided into primary stage units (PSU) then a sample of PSUs is drawn. In the second stage of
sampling, each selected PSU is subdivided into second-stage units (SSU) then a sample of SSUs is
drawn. The process of subsampling can be carried to a third stage, fourth stage and so on, by
sampling the subunits instead of enumerating them completely at each stage.
Advantages
- Listing cost is reduced.
- Transportation cost is also reduced.
Disadvantages
- Estimation procedure is difficult, especially when the primary stage units are not of the
same size.
- Estimation procedure gets more complicated as the number of sampling stages
increases.
- The sampling procedure entails much planning before selection is done.
References: