Module 1 - The Nature of Statistics
Module 1 - The Nature of Statistics
Introduction:
Probability and statistics acts as an inseparable twins, regardless whether you have done your
experiment, or planning to carry it, the following question is always hanging there: :what is the chance of
success?”
Probability and statistics, was considered as the fields of mathematics concerned with the laws
regulating random events, including the gathering, analysis, interpretation, and display of numerical data.
Probability has its beginnings in the study of gambling and insurance in the 17th century, and it is now a
vital instrument of both social and natural sciences. Statistics, however, have its origin in census counts
taken thousands of years ago (Porter, 2020).
The word statistics is derived from the Latin word "status" or from the Italian word "statista"
which can be attributed as “political state” or “government”. In the past, the rulers and kings
employed statistics to gather data or needed information on land, farming, trade and their state
populations to evaluate their military capability, wealth, fiscal resources and other government
issues. Thus, statistics is closely linked with the administrative affairs of a state.
Through the eighteenth century statistics was mathematical, political and governmental. In
the early nineteenth century a famous Belgian Statistician, Quetelet, applied statistics to
investigation of social and educational problems. Beyond any doubt, Francis Galton has the
greatest effect on the introduction and use of statistics in the social sciences. Galton contributed
in the field of heredity and eugenics, psychology, anthropology, and statistics. Our present
understanding of correlation, the measure of agreement between two variables, is credited to him.
The mathematician Pearson collaborated with Galton in later years and was instrumental in
developing many of the correlation and regression formulas that are in us today. Among Galton’s
contribution was the development of centiles and percentiles.
Though the importance of statistics was strongly felt, its tremendous growth was in the
twentieth century. During this period, lot of new theories, applications in various disciplines were
introduced. With the contribution of renowned statisticians several theories and methods were
introduced, naming a few are Probability Theory, Sampling Theory, Statistical Inference, Design
of Experiments, Correlation and Regression Methods, Time Series and Forecasting Techniques.
In early 1900s, statistics and statisticians were not given much importance but over the years
due to advancement of technology it had its wider scope and gained attention in all fields of
science and management. It is pertinent to note that the continued growth of statistics is closely
associated with information technology. As a result several new inter- disciplines have emerged.
They are Data Mining, Data Warehousing, Geographic Information System, Artificial Intelligence
etc. Now-a-days, statistics can be applied in hardcore technological spheres such as
Bioinformatics, Signal processing, Telecommunications, Engineering, Medicine, Crimes, Ecology,
etc.
B. Definition of Statistics
In the first place, it is a plural noun which describes a collection of numerical data
such as employment statistics, accident statistics, population statistics, birth and death,
income and expenditure, of exports and imports etc. It is in this sense that the word
'statistics' is used by a layman or a newspaper.
- The word 'statistics' is defined by Croxton and Cowden as follows:- "The collection,
presentation, analysis and interpretation of the numerical data."
Example 1. The students officially enrolled in any class at the City of Malabon University, form a
population since there are no more students that will have the same property.
Example 2. Consider the number of students enrolled in a particular class, and choose, at random a
committee of five students. This committee is a sample of that population.
Note: The elements in a population, or in a sample, are called observations, measurements, scores or just
data.
a. Descriptive Statistics
Descriptive statistics describe a sample. You simply take a group that you’re interested in,
record data about the group members, and then use summary statistics and graphs to present
the group properties. With descriptive statistics, there is no uncertainty because you are
describing only the people or items that you actually measure. You’re not trying to infer
properties about a larger population.
The process involves taking a potentially large number of data points in the sample and
reducing them down to a few meaningful summary values and graphs. This procedure allows us
to gain more insights and visualize the data than simply pouring through row upon row of raw
numbers.
Using descriptive statistics, we can present the test scores in graphical form and other
statistic available.
These results indicate that the mean score of this class is 79.18. The scores range from
66.21 to 96.53, and the distribution is symmetrically centered around the mean. A score of at
least 70 on the test is acceptable. The data show that 86.7% of the students have acceptable
scores.
Collectively, this information gives us a pretty good picture of this specific class. There is
no uncertainty surrounding these statistics because we gathered the scores for everyone in the
class. However, we can’t take these results and extrapolate to a larger population of students.
The most common methodologies in inferential statistics are hypothesis tests, confidence
intervals, and regression analysis.
For descriptive statistics, we choose a group that we want to describe and then measure
all subjects in that group. The statistical summary describes this group with complete certainty
(outside of measurement error).
For inferential statistics, we need to define the population and then devise a sampling
plan that produces a representative sample. The statistical results incorporate the uncertainty
that is inherent in using a sample to understand an entire population. The sample size becomes a
vital characteristic. The law of large numbers states that as the sample size grows, the sample
statistics (i.e., sample mean) will converge on the population value.
A study using descriptive statistics is simpler to perform. However, if you need evidence
that an effect or relationship between variables exists in an entire population rather than only
your sample, you need to use inferential statistics.
1. Qualitative Data- data are measurements that each fail into one of several categories. (hair
color, ethnic groups and other attributes of the population)
Qualitative data are generally described by words or letters. They are not as widely used as
quantitative data because many numerical techniques do not apply to the qualitative data.
For example, it does not make sense to find an average hair color or blood type.
2. Quantitative Data - data are observations that are measured on a numerical scale
(distance traveled to college, number of children in a family, etc.)
Quantitative data are always numbers and are the result of counting or measuring attributes
of a population.
Probability or random sampling (each member of the population has an equal chance
of being selected)
Non- probability or non-random sampling
The actual process of sampling causes sampling errors. For example, the sample may not be
large enough or representative of the population.
Factors not related to the sampling process cause non-sampling errors. A defective counting
device can cause a non-sampling error.
A. Simple Random Sampling or Lottery Sampling - selection so that each has an equal
chance of being selected.
B. Systematic Random Sampling - Select some starting point and then select every Kth element
in the population
C. Stratified Sampling - subdivide the population into subgroups that share the same
characteristic, then draw a sample from each stratum.
d. Cluster Sampling - divide the population into sections (or clusters); randomly select
some of those clusters; choose all members from selected clusters
b. Judgment Sampling - In this case, the person taking the sample has direct or indirect control
over which items are selected for the sample. An expert selects a representative sample according
to his own subjective judgment.
c. Quota Sampling - The main concern in quota sampling is to come up with the desired number
of samples no matter how they are selected. In this method, the decision maker requires the
sample to contain a certain number of items with a given characteristic. Many political polls are,
in part, quota sampling.
d. Volunteer Sampling – Sample consists essentially of volunteers.
e. Haphazard/Incidental Sampling – samples are selected purely by chance; that is, whoever is
available at the time and place the data is to be collected.
f. Purposive Sampling – The researcher selects those who can best help or give information
based on his own judgment. Subjects are not randomly selected.
Slovin's formula is a very general equation used when you can estimate the population but
have no idea about how a certain population behaves. The formula is described as:
Note that this is the least accurate formula and, as such, the least ideal. You should only use
this if circumstances prevent you from determining an appropriate standard deviation and/or
confidence level (thereby preventing you from determining your z-score, as well).
Example 1: Calculate the necessary survey size for a population of 240, allowing for a 4% margin
of error.
3. A retailer who is interested to know how many of their customers bought an item from them
after viewing their website on a certain day. Given that their website has on average, 10,000
views per day determine the sample size of the customers that they have to monitor at a 95%
confidence level with a 5% margin of error.
References
Downie & Heath, Basic Statistical Method Fifth Edition, Harper and Row, Publisher, Inc., 1983
Porter, Theodore M.. "Probability and statistics". Encyclopedia Britannica, 3 Feb. 2020,
https://www.britannica.com/science/probability. Accessed 10 August 2021.
https://www.brainkart.com/article/Origin-and-Growth-of-Statistics_35037/
https://statisticsbyjim.com/jim_frost/